An evolutionary robotics approach for the ... - Semantic Scholar

Evol. Intel. (2014) 7:107–118 DOI 10.1007/s12065-014-0111-9

SPECIAL ISSUE

An evolutionary robotics approach for the distributed control of satellite formations Dario Izzo • Luı´s F. Simo˜es • Guido C. H. E. de Croon

Received: 15 November 2013 / Revised: 26 June 2014 / Accepted: 27 June 2014 / Published online: 12 July 2014 Ó Springer-Verlag Berlin Heidelberg 2014

Abstract We propose and study a decentralized formation flying control architecture based on the evolutionary robotic technique. We develop our control architecture for the MIT SPHERES robotic platform on board the International Space Station and we show how it is able to achieve micrometre and microradians precision at the path planning level. Our controllers are homogeneous across satellites and do not make use of labels (i.e. all satellites can be exchanged at any time). The evolutionary process is able to produce homogeneous controllers able to plan, with high precision, for the acquisition and maintenance of any triangular formation. Keywords Satellite swarm control Evolutionary robotics Neural networks Particle swarm optimization

1 Introduction In the early nineties, a methodology called evolutionary robotics (ER) emerged in the context of the automated design of robotic controllers. In ER, a robot control system is encoded by real and/or integer values. Possible controls, D. Izzo (&) Advanced Concepts Team, European Space Agency, Noordwijk, The Netherlands e-mail: [email protected] L. F. Simo˜es Computational Intelligence Group, VU University Amsterdam, Amsterdam, The Netherlands e-mail: [email protected] G. C. H. E. de Croon Micro Air Vehicle Laboratory, TU Delft, Delft, The Netherlands e-mail: [email protected]

corresponding to a particular choice of these values, are evaluated by simulation, and evolutionary optimization techniques used to increase their quality, or ‘‘fitness’’, to use a common term in evolutionary computing. Since the very first experiments, neural networks, both feed-forward and recurrent, were used to encode the control system (in which case the term evolutionary neurocontrollers was also employed). A good review of the results obtained in the context of ER research in the nineties, was written by the most active scientists in the field [28]. As acutely pointed out by Lipson [26] in his 2001 review of Nolfi and Floreano’s book, ER shares a good deal of ideas with behaviour-based robotics but it dares to be more ambitious by asking an artificial evolution process to design the ‘‘assembly plan’’ and the basic building blocks of the action-perception mechanism. When successful, the methodology provides a rather efficient solution to the problem of emergent behavior synthesis and is thus a great tool to design robotic controllers. The technique has been successfully demonstrated in Earth robots: photo-taxis experiments [12] as well as swarming behavior synthesis [6] and in some basic maze-solving [9] and self-assembly [1] tasks. The idea of representing the control system of an agent as a chromosome subject to the laws of artificial evolution, has also been, in parallel, investigated in the context of space technologies. In particular, a series of papers by Dachwald [14–16] investigated the possibility of encoding a spacecraft’s interplanetary trajectory controller into an artificial neural network, and evolving its performance. While promising and certainly innovative, his method failed to prove to be an improvement over more traditional techniques rooted in optimal control theory or evolutionary approaches. In the case of an interplanetary spacecraft trajectory, the level of uncertainty on the perception/action model is rather small and optimal control theory, used in

123

108

conjunction with search techniques (i.e. global optimization or tree searches), is able to provide a convincing answer to even the most difficul trajectory design problems [23, 35]. The situation is rather different for locomotion on unknown planetary terrains, a topic discussed (in the framework of ER) in a series of studies from the European Space Agency [11, 17, 33] and later continued by one of the participating universities [34]. In this case, the use of the ER technique is well justified as the body model, the contact forces model, as well as the environment is largely uncertain and a robust and adaptive controller is sought. In this paper we propose and study a third context where ER techniques can be successfully used in the context of space applications: that of formation flying, and in particular decentralized formation flying. In such a domain, two or more spacecraft need to coordinate in order to achieve a common mission objective. Examples include in-orbit selfassembly, as well as the acquisition and maintenance of a rigid geometrical formation in space, among others. The word decentralization implies that no leader or external ‘‘intelligence’’ is assumed: the achievement of the mission goal is an ‘‘emerging’’ property of local interactions among spacecraft. This definition matches quite closely those in the collective robotics [25] and swarm intelligence [10] domains. For this reason, and following Izzo and Pettazzi [21], we refer to decentralized formation flying as ‘‘satellite swarm control’’. Different proposals were put forward in the past few years to tackle the satellite swarm control problem. They can be roughly categorized as follows: the virtual-structure approach, the behaviour-based approach, and the artificial potential approach. In the virtual-structure approach [31], the final desired formation is associated to a rigid frame and a decentralized control architecture is proposed that allows satellites to track their pre-assigned place in the rigid frame. The approach allows some valuable convergence properties to be proved, but as it pre-assigns satellites’ positions in the target formation, it does not solve the task allocation problem. This constitutes a limitation, especially whenever a large number N of identical satellites are considered, and the possible final configurations are thus N! (e.g. for a self-assembly task as in the concept of Ayre et al. [3]). In the behaviour-based approach, first conceived for Earth based robotics [2] and later considered for satellite swarming [21, 22, 29], each spacecraft defines at each instant a desired state as a sum of weighted contributions. A different layer of the control system (sliding mode controller or a simpler strategy) takes care of computing the action required to cancel the error between the actual and the desired state. The approach, as pursued in [21], results in an extremely simple algorithm, able to solve simultaneously and in a distributed manner the task allocation problem and the path planning problem, i.e. to

123

Evol. Intel. (2014) 7:107–118

Fig. 1 The MIT SPHERES platform on board the ISS

assign each satellite to its final position in the target structure and to drive and keep the satellites there. However, the limitations of such an approach lie in the task allocation part that, when requested, limits the number of possible final geometrical configurations allowed and thus the method’s usability. For example, the method from [21] is limited to symmetrical configurations. Finally, in the artificial potential approach, the forces and torques acting on each spacecraft are modelled as artificial potential fields [4, 29], creating ad hoc attractions and repulsions as needed. This technique can be very similar to the behaviourbased approach and the two can, in fact, be shown to be equivalent in some cases [21], if we accept that potentials define kinematical fields rather than force fields. In this paper we propose and perform a preliminary study of the ER technique as an alternative approach to design the control scheme of a decentralized formation flying mission. We expect this approach to be less limited with respect to task allocation. The reason for this is that the previously mentioned limits on task allocation, such as the sole formation of symmetrical structures, may be due to a lack of ‘imagination’, or biases on the part of human designers. The ER-technique transfers part of the design to a blind optimization process, and may hence come up with more flexible solutions. We develop and test all of our controllers for the MIT SPHERES robotic platform [27] (Fig. 1), currently on board the International Space Station (ISS). Such a platform is made of three six-degrees-of-freedom ‘‘agents’’ able to move freely in a micro-gravity environment and thus our work addresses triangular formations. The paper is structured as follows: in Sect. 2 we introduce the notation denoting reference frames. In Section 3 we describe the formation acquisition and maintenance task we study in this work. In Sect. 4 we discuss the decentralized control architecture, while we discuss the

Evol. Intel. (2014) 7:107–118

109

perception in Sect. 5, the neural networks in Sect. 6, the fitness functions in Sect. 7, and the used optimization technique in Sect. 8. The results are discussed in two parts. First, the evolved networks are tested in the simplified simulation environment they evolved in (Sect. 9). Subsequently, we test and analyze our results in the MIT SPHERES simulator (Sect. 10), using the evolved networks as path planners in the full control loop. We conclude in Sect. 11.

2 Notation We make use of Hughes notation [20] to indicate vectors and vectrices (i.e. triplets of unit vectors ordered in a column and defining a reference frame). We indicate with ^ T the vectrix associated to the absolute referF ¼ ½î; ^j; k ence system centered in O, while Fi ¼ ½bî1 ; bî2 ; bî3 T ; i ¼ 1. . .N will be the vectrices associated to the body frame of the i-th satellite. In this notation, a vector admits different representations according to the chosen representation frame: ~ x ¼ FT x ¼ FTi x~i . We reserve the tilde symbol for vector components in body axis. The same symbol, without accent, will indicate the components of the same vector in the absolute frame. The relation between these components is x~i ¼ Fi FT x, where 2 3 bî1 î bî1 ^j bî1 k^ 6 7 Fi FT ¼: Ci ¼ 4 bî2 î bî2 ^j bî2 k^ 5 bî3 î bî3 ^j bî3 k^ is a matrix in the group SOþ ð3Þ, that we parameterize using the quaternion qi ¼ ½q1 ; q2 ; q3 ; q4 T as follows: 2

q24 þ q21 q22 q23 6 Ci ¼ 4 2ðq1 q2 þ q4 q3 Þ 2ðq1 q3 q4 q2 Þ

2ðq1 q2 q4 q3 Þ q24 q21 þ q22 q23 2ðq2 q3 þ q4 q1 Þ

3 2ðq1 q3 þ q4 q2 Þ 7 2ðq2 q3 q4 q1 Þ 5 q24 q21 q22 þ q23

3 The formation acquisition task Consider N satellites modelled as six degrees of freedom rigid bodies. Borrowing some terms introduced in [31], we define a final target formation as a virtual structure defined by a frame F located in a point O and N target vectors ~ ni ¼ FT ni defining a final target geometry. N sets of possible target quaternions Qi are also considered: they define, for each target position ni , the allowed final orientations of the satellites (i.e. of Fi ; i ¼ 1. . .N). The formation acquisition task studied in this paper can then be defined, for N satellites placed in randomly initialized initial conditions, as the task of ‘‘constructing’’ the final desired virtual structure. This task, in its generic form, is relevant to concepts such as ESA’s Darwin mission [18]

Fig. 2 Artistic vision of one possible configuration for the Darwin mission. Satellites flying tens to hundreds of metres apart would have to be controlled to within 1 cm of their intended positions, and would have to orient themselves so as to accurately point at a target star

(Fig. 2), NASA’s Terrestrial Planet Finder [7] or on-orbit assembly [4, 22] in general. The position vector of the i th satellite is written as ~ xi ¼ FT xi , where T xi :¼ ½xi ; yi ; zi . Formally, the formation acquisition task is solved at time t, if there exists a permutation s of ½1; 2; ::; N ! ni and q ðtÞ 2 Qi ; i ¼ 1::N. such that ~ xs ðtÞ ¼ OO þ ~ i

si

It is useful to further classify such a task: one may assume or not that each satellite is assigned to a particular place in the final formation (i.e. fix the permutation s). In the first case, task allocation does not happen as each satellite has a specific position to maintain in the formation from the very beginning. Satellites need to be labeled and we speak of formation control without task allocation. In this case one could provide each satellite with a different controller tailored to his own specific ‘‘role’’. Convincing solutions have been proposed that make use of behaviourbased controllers (we point here to the work of Gazi [19] or Balch and Arkin [5]). In the second case, on the other hand, the satellites will need to agree on ‘‘who goes where’’, thus deciding on what role to play in the final formation (or virtual structure) and thus will need to allocate their tasks: we speak of formation control with task allocation. General solutions to this task are less straightforward. One of the few works addressing this issue is that of Izzo and Pettazzi [21], who propose an inverse dynamical computation (the Equilibrium Shaping) that is able to cope with only a certain subclass of final geometries (having symmetry relations). In this paper, as we experiment with the MIT SPHERES platform, the formation acquisition task is solved for N ¼ 3

123

110

Evol. Intel. (2014) 7:107–118

letting F free and defining the final allowed satellite orientations Qi so that the satellite axis is perpendicular to the formation plane. While this is only a specific instance of the formation acquisition task, more general cases (e.g. for higher N) are under study.

4 The decentralized control architecture Consider a rigid body model for each satellite. The kinematics is described by the simple set of differential equations: ~i x_i ¼ vix ; y_i ¼ viy ; z_i ¼ viz ; q_i ¼ Qi x

ð1Þ

where the matrix Qi defines the quaternion kinematics (see [20]) in the body frame. 0

q4 B 1 B q3 Q¼ B 2 @ q2 q1

q3 q4 q1 q2

1 q2 q1 C C C q4 A

Fig. 3 Overall control architecture for each satellite

ð2Þ

q3

The dynamics of each satellite is then described by the following set of equations: ~ i Ii x ~_ i þ x ~i ¼ M v_ix ¼ uix ; v_iy ¼ uiy ; v_iz ¼ uiz ; Ii x

ð3Þ

where we introduced the inertia matrix Ii and the controls ui and Mi representing the sum of all thrust actions and their total momentum with respect to the center of mass. Equations 1, 2 describe the kinematics of our system, while Eq. 3 its dynamics. We explicitly separated these equations as our decentralized control architecture acts differently over the two corresponding layers. In a first layer, named the path-planning layer, as depicted in Fig. 3, the neural networks Nt ; Nr transform the spacecraft perception and proprioception into ‘‘desired’’ velocities and angular velocities. These, if used directly in the right hand side of Eqs. 1, 2 produce the desired formation behaviour. A second layer, called the control layer, computes thruster actions u to cancel the error between desired and actual velocities. Though this paper is mainly concerned with the design of Nt ; Nr using the evolutionary robotics technique, we perform our experiments testing the complete control architecture. For the control layer, as we use the MIT SPHERES robotic platform [27], we exploit the MIT provided routines implementing simple proportional-derivative (PD) control strategies to compute the thruster actions. As a spacecraft simulator, we use the SPHERES platform simulator, also available from MIT. This simulator was designed to have a low ‘‘reality gap’’, as it is used to failproof all experiments approved for ISS flights.

123

5 Perception / proprioception We assume that the i th satellite perceives the relative positions of all other satellites in the formation. More formally, if we write the vectors ~ij x ¼~ xj ~ xi ¼ FTi x~ij , we assume to be able to measure, 8j ¼ 1::N; j 6¼ i, the body axis components x~ij of the relative position vectors. Our assumptions on the satellite perception need a few words as they might appear unusual. First, we do not consider myopic satellites (i.e. a limited range of perception). This assumption, while leading to interesting work in the field of swarm robotics, is not necessarily applicable in a satellite formation flying context, where avoiding ‘‘lost in space’’ situations is a fundamental design driver. Second, we do not include velocities in the satellites perception. Velocities, and angular velocities, are certainly useful information to plan the satellite actions in many tasks, but they are not necessary for the formation acquisition task as proved by the behaviour based technique Equilibrium Shaping [21]. 5.1 Disorientation of artificial neural networks Unlike most tasks studied by the ER community, we consider six degrees of freedom agents operating in a three dimensional empty environment. This has an immediate consequence on the perception-action loop as it implies the lack of an absolute ‘‘sense’’ of direction. Similarly to what is experienced by astronauts when operating in a zero gravity environment (and thus deprived of the vertical direction defined by gravity), our agents need to plan their actions regardless of their orientation with respect to the world. Consider the following example. An agent

Evol. Intel. (2014) 7:107–118

111

perceives, in its body frame, a three dimensional vector x~ (for example a point in space) and needs to plan, again in its body frame, his action u~ (for example its thrust vector) to achieve some goal. Imagine to use a simple perceptron ~ We may represent N in N for this task: Nð~ xÞ ¼ u. matrix notation and write u~ ¼ W~ xþb where W is the weights matrix and b is the biases vector. We request the action to be invariant under agent rotations R so that the same absolute action results regardless of the agent orientation. In formal terms when x~0 ¼ R~ x is perceived, the action u~0 ¼ Ru~ needs to be planned. Combining the above equations we easily derive that, necessarily: Rb þ RW~ x ¼ b þ WR~ x 8~ x 2 R; 8R 2 SOð3Þþ , which implies W ¼ I and b ¼ 0. In other words the network weights and biases are not free to evolve at all, should we request the rotation invariance, and thus no task, but the trivial one, can be solved by such a simple network. This effect, here formally presented in case of a perceptron, is what we call ‘‘disorientation of neural networks’’. We thus introduce the further assumption that the satellite is equipped with a star tracker, or a similar sensor, which determines the satellite orientation qi with respect to the absolute frame F. This added proprioceptive input allows all perceived vectors measured in Fi to be be transformed to the absolute frame F as follows: ~ xi ¼ FTi x~i ¼ FT xi ! xi ¼ F FTi x~i ¼ CTi x~i The disorientation effect is overcome at the price of hardware complexification.

6 The neural networks We make use of two different artificial neural networks in the action-perception architecture of each agent. The first network, Nt , is in charge of the translational kinematics, transforming the perceived inputs (i.e. the relative position vectors of the other satellites) into desired velocities: we can thus write vd ¼ Nt ðr1 ; r2 Þ. The second network transforms the body perception of a target final orientation into a desired angular velocity (again in body coordinates): ~ Note that we dropped the subscript identify~ d ¼ NðnÞ. x ing each satellite as all agents are homogeneous and thus have identical perception-action mechanisms. Indicating with Ni ; No and Nh the number of input, output and hidden neurons, and with I; h; O the actual values of the inputs, hidden and output neurons, the following equations hold in general for the neurons of our networks:

Fig. 4 The multi-layer perceptron defining the SPHERES translational kinematics. Note the invariance with respect to permutations of ri

XNi I w ; k ¼ 1::Nh hk ¼ r bk þ j kj j¼1 XNh ^ k ¼ r b0 þ O h w0 ; k ¼ 1::No k j¼1 j kj where we indicated with wkj ; w0kj the network weights, with bk ; b0k the network biases, and with rðxÞ ¼ 1=ð1 þ ex Þ the sigmoid function used as activation function in all our experiments. A further linear transformation is then applied to map the output domain from ½0; 1 to ½Om ; OM . For^k ðOM Om Þ. mally: Ok ¼ Om þ O 6.1 The translational kinematics network Nt

The artificial neural network Nt is in charge of the definition of each satellite’s ‘‘desired velocity’’ vd . We designed such a network starting from the following requirements: 1. 2. 3.

Avoid disorientation effects. Be invariant to satellites permutations. Allow for micrometre precision in the final formation maintenance.

The resulting network is depicted in Fig. 4. We use Ni ¼ 4; Nh ¼ 10 and No ¼ 3, corresponding to a total of ðNi þ No Þ Nh ¼ 70 network weights and Nh þ No ¼ 13 network biases. As we assume satellites know their orientation with respect to F, we use, in the network inputs, the absolute cartesian components of the relative position vectors: i.e. r1 ; r2 and not their relative counterparts r~1 ; r~2 . This way, we avoid network disorientation effects. Note also how we input the sum of r1 and r2 . This is equivalent to inputting the vectors separately and then forcing the output’s invariance over satellite permutations. So as to facilitate the achievement of high precision, we add as a separate

123

112

Evol. Intel. (2014) 7:107–118

desired velocities are actuated perfectly. Essentially, we integrate the following system of equations: x_ 1 ¼ Nt ðx2 x1 ; x3 x1 Þ x_ 2 ¼ Nt ðx3 x2 ; x1 x2 Þ x_ 3 ¼ Nt ðx1 x3 ; x2 x3 Þ

Fig. 5 The multi-layer perceptron defining the SPHERES rotational kinematics

(and redundant) input: the sum d1 þ d2 , where d1 ¼ jr1 j, d2 ¼ jr2 j. We used Om ¼ 0:3; OM ¼ 0:3 as minimum and maximum output velocities. 6.2 The attitude kinematics network Nr

The artificial neural network Nr is in charge of the ~d . definition of each satellite’s ‘‘desired angular velocity’’ x We use Ni ¼ 3; Nh ¼ 2 and No ¼ 2, corresponding to a total of ðNi þ No Þ Nh ¼ 10 network weights and Nh þ No ¼ 4 network biases. We will ask the satellites’ z-axis to be aligned and perpendicular to the formation plane. As such, we send as r r input to the network the target unit vector n~ ¼ C jrijij rikik j. We used Om ¼ 0:3; OM ¼ 0:3 as minimum and maximum output angular velocities.

ð4Þ

with initial conditions: x1 ð0Þ ¼ x10 ; x2 ð0Þ ¼ x20 ; x3 ð0Þ ¼ x30 . At t ¼ 0, all satellites’ positions are initialized at random locations within a cube of 4 m wide sides, with the whole formation’s initial center of mass forced to be the origin. The integration is carried out with a variable step numerical algorithm [in our case a Runge-Kutta PrinceDormand (8, 9) method] keeping the relative error below the threshold ¼ 109 . Note how this is also rather unusual in ER works, where most simulators update agents’ positions with ‘‘update rules’’ of the kind: xiþ1 ¼ f1 ðxi ; vi Þ viþ1 ¼ f2 ðxi ; vi Þ

ð5Þ

which, essentially, corresponds to using an Euler integration scheme, which would be 3–4 orders of magnitude slower, thus not allowing us to evolve as large populations for the same amount of generations as we do. At t ¼ T the simulation then stops, and the swarm’s performance, with regard to its collective goal, is evaluated by a fitness function. In our case, as we wish to acquire and maintain a triangular formation, we use the following fitness function: ft ¼ ðL21 l21 Þ2 þ ðL22 l22 Þ2 þ ðL23 l23 Þ2 þ jv1 j2 þ jv2 j2 þ jv3 j2

7 Setting up the evolution In order to design the weights and biases of the two networks described above, we use the ER technique. Neural networks are encoded in chromosomes x ¼ ½w; b containing their respective weights and biases. Simulations making use of a specific neural network provide feedback about its quality and, driven by such feedback, an evolutionary algorithm (in our case we select a Particle Swarm Optimizer)1 interbreeds the best out of a population of chromosomes. By iterating this process over a span of multiple generations, an initially random population of chromosomes evolves into highly fit solutions. 7.1 Fitness evaluation for Nt To associate a fitness value to a particular network Nt we simulate the dynamics of the satellite formation, assuming 1

Though not strictly an ‘‘evolutionary algorithm’’, Particle Swarm Optimization (PSO) belongs to the same general class of metaheuristics, or population-based stochastic search procedures.

123

ð6Þ where l1 ; l2 ; l3 are the sorted values of the three intersatellite distances and L1 ; L2 ; L3 the three sorted values of the targeted triangle sides. 7.2 Fitness evaluation for Nr To associate a fitness value to a particular network Nr , similarly to what is done above, we simulate the attitude dynamics of each satellite, assuming desired angular velocities are actuated perfectly. Essentially, we integrate the following system of equations: Nr ðCnÞ ~d ¼ Q q_ ¼ Qx ð7Þ 0 ~z ¼ 0 as, in this particular application, Note that we set x we wish to control only the orientation of the z-axis. At t ¼ 0, a unit quaternion is randomly generated for each satellite, by sampling uniformly over the space of rotations, according to the procedure described in [32]:

Evol. Intel. (2014) 7:107–118

pffiffiffiffiffiffiffiffiffiffiffiffiffi qi ð0Þ ¼ ½ 1 u1 sinð2pu2 Þ; pffiffiffiffiffiffiffiffiffiffiffiffiffi 1 u1 cosð2pu2 Þ; pffiffiffiffiffi u1 sinð2pu3 Þ; pffiffiffiffiffi u1 cosð2pu3 Þ

113

ð8Þ

where u1 ; u2 ; u3 2 ½0; 1 are chosen uniformly at random. The integration is carried out as previously described, and at t ¼ T the satellite fitness fr is determined by: fr ¼ acosðb^3 n^Þ

ð9Þ

representing the displacement between the satellite’s final ðb^3 Þ and target pointing direction n^. Note how, unlike what is done for ft , the fitness function does not contain satellites’ angular velocities. The minimization of angular velocities is, however, still present implicitly: should angular velocities not have vanished by the time the simulation stops, most likely fr will hold a high value.

Fig. 6 Fitness function ft and average particle velocity during one particular run of our PSO algorithm for the evolution of Nt

generalizes properly to conditions not observed during evolution. 8 The particle swarm optimizer Particle Swarm Optimization (PSO) [13, 24, 30] was chosen for carrying out the evolution of neural network controllers. Fitness is here a random variable (simulations are deterministic, but outcomes depend on the random initial conditions), and evolution must then proceed under a degree of uncertainty regarding which candidate solutions in the population are truly superior, in absolute terms. To address this problem, a modified generational version of the PSO algorithm was implemented, and is now made available through the PaGMO open source project [8]2. Its implementation follows the principle of continuously presenting particles with new challenges, while ensuring all fitness comparisons are based on the outcomes of simulations which used the same sets of random initial conditions. Particle updates occur synchronously, in periods designated as generations. A new set of random number generator seeds is created at the beginning of each generation. All fitness evaluations then take place by simulating (in the satellite dynamics) from initial conditions generated from those seeds. A change is also required to the way particles’ memories are traditionally handled: here, every generation ends with them too being reevaluated in the set of simulation conditions defined for the generation, and being overwritten with the respective particle’s current position should it be found to be superior. This constant change of simulation scenarios prevents overfitting of the controller, and pushes for a behaviour that

2

https://github.com/esa/pagmo/wiki.

9 Experiments, part I: evolution In the evolution of Nt , for the evaluation of each particle’s current and memory positions, we computed the fitness in Eq. 6 n ¼ 5 times by integrating Eq. 4 from different initial conditions (randomly generated anew each generation, and shared by all particles), and then averaged it. Integration proceeded for up to T = 50 s of simulated time, so evolution would come up with a behaviour that would safely achieve our goals within the time frame of an individual experiment on board the ISS with the SPHERES satellites. The neural network weights and biases encoded in particles’ chromosomes x were randomly initialized in the [1,1] interval. Significant for the attainment of the reported performance levels was the non-enforcement of these bounds after the initialization stage, i.e. weights were then free to take values outside the [1, 1] range. Particles’ velocity vectors were initialized to ~ 0, and then constrained to have components at all times in the [1,1] interval. Velocity updates were given by the equation with constriction coefficients, using a standard parametrization of constriction factor v set to 0:7298, along with acceleration coefficients /1 and /2 both set to 2:05 [13, 30]. The evolution of Nt was run in an asynchronous island model. A fully-connected topology was used for migrations, in an archipelago containing eight islands, each evolved for a total of 1,000 generations. Each island evolved one PSO swarm containing 512 particles. Within a swarm, particles were arranged in a ring topology, and updated by taking into consideration the performance levels of the four particles within a radius of two of themselves.

123

114

Evol. Intel. (2014) 7:107–118

Fig. 7 Different behaviours obtained at intervals of 100 generations in one particular evolution. Initial conditions are kept the same. The box shows the ½1; 13 cube. Note how the achievement of a triangular formation its done quite early in the evolution, while getting a zero final velocity and a precise shape takes more generations

We performed multiple experiments, targeting different satellite formations. Of these, we report here the results pertaining to the achievement of an asymmetrical triangle formation, having sides of length L1 ¼ 0:6; L2 ¼ 0:7; L3 ¼ 0:8. The exact code used to perform these experiments is made available as part of the open source PaGMO code [8]. In Fig. 6 the fitness ft is reported against the number of generations, together with the average particle velocity in PSO. We note how the experimental setup produces a smooth decrease of the fitness to very high precision. In Fig. 7 we show, at different stages of the evolution, the behaviours produced. The picture, consistent with what is observed in most of our experiments, shows how the early generations are spent evolving a behavior that consistently groups the satellites into a triangle of roughly the correct shape. The following generations are then spent achieving a behaviour that results in zero final velocities, and the rest of the evolution (actually most of it) is then spent refining the formation acquisition’s precision. The evolution of Nr proceeded in essentially the same way as that of Nt , with the exceptions that a single PSO population was used, and it was evolved for a total of 500 generations. Also, the fitness measure in Eq. 9 was averaged over n ¼ 50 simulations by integrating Eq. 7 from different initial conditions. The best found Nt and Nr neural networks were subsequently subjected to an extensive reevaluation process, so as to more accurately assess their performance. In total, 25,000 simulations were conducted, from randomly defined

123

Fig. 8 Performance across 25,000 simulations, with satellites controlled simultaneously by Nt (top) and Nr (bottom). Statistical measures shown for each variable: 25th, 50th (median) and 75th percentiles (filled area), and sample maximum and minimum (dotted lines)

initial conditions. During evolution, satellites were controlling only either movement or attitude, whereas in these simulations they were now simultaneously controlled by both networks. Figure 8 presents a statistical analysis of the observed variables. The plots show a continuous representation of box-and-whisker plots, with whiskers here representing the maximum and minimum values observed across the 25,000 simulations. Each satellite’s performance was tracked across the T = 50 s of simulation time, but we present for each variable its statistics over an averaging across the 3 satellites (in the case of positional accuracy, the average taken is over the 3 Li distances in the target formation). Figure 8 (top) shows the performance level achieved by the translational kinematics network Nt . In it, the fitness measure ft is essentially decomposed into two components,

Evol. Intel. (2014) 7:107–118

depicting positional accuracy and velocity. Positional accuracy measures how much inter-satellite distances differ from those in the target configuration (in meters). We see a steady rate of improvement, across simulation time, down to an error of 5:38 1011 m (median) at t = 50 s. Using the ft fitness measure (which also incorporates a velocity component), this same set of simulations evaluates to a value of 4:11 1020 (median). In Fig. 6 we previously saw, across a single evolutionary run, this value being approximated to 1019 , by averaging across just five simulations. These values indicate the low number of simulations carried out during evolution was still resulting in fitness values that were sufficiently informative to steer evolution towards better solutions. Though we evolved for average, and not worst-case performance, Fig. 8 (top) shows the obtained Nt neural network far surpassing the goal of micrometre (106 ) precision. We do see a worst case performance of 2:19 104 m, but such performance values are only encountered in extremely rare circumstances. Indeed, out of 25,000 simulations, in only 17 (0:07 %) was a positional accuracy worse than 1 106 achieved by the end. Figure 8 (bottom) shows the performance level achieved by the attitude kinematics network Nr . Pointing accuracy measures how far off (in radians) are satellites’ pointing directions from their goal. Nr could have been co-evolved together with Nt , or in simulations where an evolved Nt network would act in parallel, so attitude control would be made aware of the underlying translational dynamics, and would work in tune with it, to achieve faster, and mode accurately, the final attitude configurations. As it is, the actions being taken by the Nt networks are merely disruptions on the actions of the Nr networks, that they have to compensate for. Still, the final Nr network proves to be highly resilient, with a final error level of 2:52 108 rad (median), or 1:44 106 degrees. With regard to the goal of microradians precision, a worst case performance of 1:61 103 rad is observed, with 700 simulations (2:80 %) having pointing accuracy worse than 1 106 rad. It should be noted, though, that this inaccuracy can be solely attributed to the disruptions being introduced by the Nt network (simulations that take too long to converge on the target triangular formation, leaving too little time for attitude to be properly corrected). Repeating this same set of 25000 simulations, but with only Nr active, leads to a pointing accuracy of 2:27 108 rad (median), with a worst case of 3:54 108 rad.

10 Experiments, part II: ISS SPHERES simulation In this section the complete control architecture presented in Fig. 3 is evaluated in simulation. The experiments are set up for the SPHERES testbed shown in Fig. 1.

115

In the context of ISS campaigns, the code that is to be uploaded to the SPHERES on board the ISS is always first tested in a high-fidelity simulator provided by MIT. The simulator accurately simulates the timing, sensing, and actuation of the SPHERES system. For example, it includes the sensors such as accelerometers and gyros with their position offsets in the actual platform bodies, and the sonar sensor layout used to track the three SPHERES’ (inertial) coordinates and attitudes. Importantly, the simulator also provides PD-controllers translating the velocity and turn rate commands as given by the neural networks into thrust commands for the 12 thrusters on board each sphere. That is, the PD-controllers form the ‘‘control layer’’ from Fig. 3. The SPHERES’ sensing capabilities are only used to compute the values of the perception / proprioception described in Sect. 5. The experimental setup is as follows. The SPHERES are initialized at a random position within a 1:0 1:0 1:0 m space, fitting within the Unity ISS module in which the experiments typically take place. After an initialization period, the SPHERES start sensing and acting. The distances between SPHERES are tracked over time for 300 s. Within this time span, the SPHERES converge to a final configuration. Figure 9 shows three screenshots of a simulation run of the distributed control architecture. The three SPHERES converge on the desired triangle formation, with final distances of 0.59, 0.69, and 0.81 m, i.e., with an average error of 0:01 m. Performing 90 runs with uniformly random initial positions in the experimental area, the average final error is 0:0076 ð0:034Þ. Note that we have not included collision avoidance in the optimization process. Although it is possible to do so, one can also include an additional avoidance procedure that is activated upon the occurrence of small intersphere distances. The target formation of a triangle with no symmetries is of particular interest to this study. To a human designer, it is difficult to imagine how three SPHERES with homogeneous controllers can achieve such asymmetrical formations. In order to provide some insight into this matter, we first analyze the motion of a single sphere when the other two spheres are fixed. The distance between the two fixed SPHERES is 0.8m, one of the target distances. Figure 10 shows with arrows the velocity outputs of the velocity neural network. Colors of the arrows encode for the zcoordinate of the origin of the arrows, ranging from 1 (blue) to 1 (red). We also plot six different trajectories of a single sphere when the other two spheres remain fixed at ð0; 0; 0Þ and ð0:8; 0; 0Þ. The arrows in Fig. 10 illustrate the existence of two ‘streams’, one from the top (in red) and one from the bottom (in blue). The trajectories all end up in one of two

123

116

Evol. Intel. (2014) 7:107–118

Fig. 9 Three screenshots over time of the MIT SPHERES simulator during a single run. The SPHERES are initialized with random positions and attitudes, but end up in the desired formation with the desired attitude

1.5

0.3

1

1

0.25

0.5

0.5 0.2 z

0

0

−0.5

0.15 −0.5 −1 2 −1.5 1

0.5

0 0

−0.5

y

−1

−1.5

−2

x

Fig. 10 Velocity field of a single sphere when two spheres are fixed at ð0; 0; 0Þ and ð0:8; 0; 0Þ. Arrows illustrate the velocity outputs of the velocity neural network at different locations in the surrounding 3D space. The color of the arrows indicates the z-coordinate of the origin of the arrow, ranging from 1 (blue) to 1 (red). Blue markers show five different trajectories of a single sphere when the other two spheres remain fixed at the shown positions

locations in space, suggesting that there are two stable equilibrium points for the single sphere (this is corroborated by additional runs not shown in the figure for visibility). Interestingly, neither of the equilibrium points reached are in the target position at distances of 0:6 and 0:7 m. Although there is a zero velocity point at such distances, it is not reached if the other spheres do not move. Indeed, performing ten runs with only a single moving sphere leads to a deteriorated performance of 0:015 ð0:013Þ. An experiment with only one sphere fixed at ð0; 0; 0Þ also does not result in the same performance as with three freely moving spheres. Over ten runs it achieves an even worse performance of 0:020 ð0:024Þ. Similar results are obtained in the simplified simulator employed for evolution, showing that the result is not due to an ‘artifact’ of the MIT simulator. Hence, the analysis shows that the desired triangle with sides of length 0:6; 0:7, and 0:8 m is achieved with higher precision in the case of three freely moving spheres.

123

0.1

−1 1 0.5

0.05 0

−0.5 0

−0.5 0.5 −1

nx

0

1

ny

Fig. 11 Magnitude of the angular velocity vector output by the attitude neural network. The magnitude is illustrated with a color map going from 0 (black) to 0.31 (white). Moreover, the figure shows with black dashed-dotted lines trajectories of the target vector over time for different runs in which the sphere positions are held fixed

As with position control, also the attitude control generalizes to the MIT SPHERES simulator. The attitude performance in the simulator is 0:0034 ð0:0017Þ. Figure 11 sheds some light onto the workings of the attitude network and its performance in the simulator. For many possible input target vectors ð~ nx ; n~y ; n~z Þ, it shows the magnitude of the angular velocity vector as determined by the attitude neural network. The magnitude is illustrated with a color map going from 0 (black) to 0.31 (white). Moreover, the figure shows with black dashed-dotted lines the trajectory of the target vector over time for different runs in which the sphere positions are held fixed. Fixing the sphere positions is only done for analysis purposes. In the figure we only want to show the effects of the attitude control, not the perturbations introduced by a continuously changing target vector. The most important observation from the figure is that there is a clear attractor at the

Evol. Intel. (2014) 7:107–118

117

ωy 1

0.5

0.5 y

1

0

n

y

ωx

−0.5

11 Conclusions

0 −0.5

−1

−0.5

0

0.5

1

n

x

−1

−0.5

0

0.5

1

n

x

Fig. 12 Angular velocity outputs of the attitude neural network. In the ð~ nx ; n~y Þ-views, black corresponds to a negative rotation 0:25, white to a positive rotation 0:25, and orange corresponds to zero rotation speed

coordinate ð0; 0; 1Þ, which is the desired target coordinate. The magnitude of the angular velocities is 0 in that point. An additional interesting observation from Fig. 11 is that the angular velocity magnitudes are not equal over different azimuth angles: smaller n~x coordinates have a smaller magnitude. There is even a rather low magnitude area close to ð0:62; 0:08; 0:78Þ, but it is not an attractor for the target vector trajectories. This can be explained by looking at the angular velocities that are determined by the attitude network. Figure 12 shows ð~ nx ; n~y Þ-views of the angular velocities, where black corresponds to a negative rotation 0:25, white to a positive rotation 0:25, and orange corresponds to zero rotation speed. In order to understand the figure, please consider a target vector of n~ ¼ ð0; 1; 0Þ. Clearly, the sphere should rotate around its n~x -axis in order to move the target vector toward the desired location. In the left plot of the figure we see that the x angular velocity is 0:25, while the rotation around the y-axis is close to zero, as it should be. The figure shows that the direction in which the target vector evolves over time is always to n~ ¼ ð0; 0; 1Þ. 10.1 Technological readiness After showing the robustness of the formation flight algorithm in the MIT simulator, a laboratory test of the algorithm is performed on a ‘glass table’ at MIT. This laboratory test involves the SPHERES hardware and requires some reprogramming to suit the embedded software constraints of the SPHERES. The neural networks also passed this test successfully and are in principle ready for flight on board the ISS. Our tests were scheduled for a past ISS campaign, but unforeseen changes to allocated astronaut time forced them to be canceled at the last minute. They are now currently on hold.

We have performed a preliminary investigation into the use of the ER methodology, for automating the synthesis of behaviours for a decentralized formation flying control architecture, meant for use by satellites operating in a zero gravity environment. Through evolution, we obtained neural networks capable of performing position and attitude control, achieving micrometre and microradian precision on the level of the path-planning layer, and centimeter and centiradian precision when coupled in a full simulator with corresponding PD control strategies. Furthermore, in a 3-satellite swarm, the methodology is seen to be capable of solving the task allocation problem, even for asymmetric configurations. A preliminary analysis shows the satellite swarm using an evolved (homogeneous) neural network controller to converge on an asymmetrical triangular formation, by making use of equilibrium points only existing in the coupled 3-satellite dynamical system—a setup that is hard to imagine for a human designer. Future work includes the extension of the methodology to larger groups of satellites. To achieve this, the inputs to the neural networks will likely have to be redefined. Doing so, may also influence the efficiency of the variable step numerical integrator used during evolution. Finally, we would also like to investigate the limits of the proposed methodology in terms of the formations that can be achieved by larger swarms.

References 1. Ampatzis C, Tuci E, Trianni V, Trianni V, Christensen AL, Dorigo M (2009) Evolving self-assembly in autonomous homogeneous robots: experiments with two physical robots. Artif Life 15(4):465–484. doi:10.1162/artl.2009.Ampatzis.013 2. Arkin RC (1998) Behavior-based robotics intelligent robotics and autonomous agents series. MIT Press, Cambridge 3. Ayre M, Pettazzi L, Izzo D (2005) Self-assembly in space using behaviour-based intelligent components. Tech. Rep. SASUBBIC05. European Space Agency, Advanced Concepts Team, Noordwijk 4. Badawy A, McInnes CR (2008) On-orbit assembly using superquadric potential fields. J Guid Control Dyn 31(1):30–43. doi:10. 2514/1.28865 5. Balch T, Arkin RC (1998) Behavior-based formation control for multirobot teams. IEEE Trans Robot Autom 14(6):926–939. doi:10.1109/70.736776 6. Baldassarre G, Nolfi S, Parisi D (2003) Evolving mobile robots able to display collective behaviors. Artif Life 9(3):255–267. doi:10.1162/106454603322392460 7. Beichman CA, Woolf NJ, Lindensmith CA (eds) (1999) The Terrestrial Planet Finder (TPF): a NASA origins program to search for habitable planets. JPL Publication 99–003, National Aeronautics and Space Administration, Washington, D.C., URLhttp://exep.jpl.nasa.gov/TPF/tpf_book

123

118 8. Biscani F, Izzo D, Yam CH (2010) A global optimisation toolbox for massively parallel engineering optimisation. In: 4th international conference on astrodynamics tools and techniques (ICATT 2010), URLhttp://arxiv.org/abs/1004.3824 9. Blynel J, Floreano D (2003) Exploring the T-Maze: evolving learning-like robot behaviors using CTRNNs. In: applications of evolutionary computing, lecture notes in computer science, vol 2611. Springer, Berlin. doi:10.1007/3-540-36605-9_54 10. Bonabeau E, Dorigo M, Theraulaz G (1999) Swarm intelligence: from Natural to Artificial Systems Santa Fe Institute Studies in the Sciences of Complexity series. Oxford University Press, New York 11. Cangelosi A, Marocco D, Peniak M, Bentley B, Ampatzis C, Izzo D (2010) Evolution in Robotic Islands. Ariadna Final Report (09/ 8301), European Space Agency, Advanced Concepts Team. www.esa.int/act. 12. Christensen AL, Dorigo M (2006) Evolving an integrated phototaxis and hole-avoidance behavior for a swarm-bot. In: Rocha LM, Yaeger LS, Bedau MA, Floreano D, Goldstone RL, Vespignani A (eds) Artificial Life X: proceedings of the tenth international conference on the simulation and synthesis of living systems. MIT Press, Cambridge 13. Clerc M (2006) Particle swarm optimization. ISTE, London. doi:10.1002/9780470612163 14. Dachwald B (2005) Optimal solar-sail trajectories for missions to the outer solar system. J Guid Control Dyn 28(6):1187–1193. doi:10.2514/1.13301 15. Dachwald B (2005) Optimization of very-low-thrust trajectories using evolutionary neurocontrol. Acta Astronaut 57(2–8):175–185. doi:10.1016/j.actaastro.2005.03.004 16. Dachwald B, Seboldt W (2002) Optimization of interplanetary rendezvous trajectories for solar sailcraft using a neurocontroller. In: AIAA/AAS astrodynamics specialist conference and exhibit, Monterey, CA, USA, AIAA-2002-4989 17. Ellery A, Scott GP, Husbands P, Gao Y, Vaughan ED, Eckersley S (2005) Bionics & Space Systems Design Case Study 1 - Mars Walker. Tech. Rep. Contract AO/1-4469/03/NL/SFe, European Space Agency, Advanced Concepts Team, Noordwijk, The Netherlands 18. Fridlund CVM (2000) Darwin—the infrared space interferometry mission. ESA Bullet 103:20–25 19. Gazi V (2005) Swarm aggregations using artificial potentials and sliding-mode control. IEEE Trans Robot 21(6):1208–1214. doi:10.1109/TRO.2005.853487 20. Hughes PC (2004) Spacecraft attitude dynamics dover books on aeronautical engineering series. Dover Publications, Mineola 21. Izzo D, Pettazzi L (2007) Autonomous and distributed motion planning for satellite swarm. J Guid Control Dyn 30(2):449–459. doi:10.2514/1.22736 22. Izzo D, Pettazzi L, Ayre M (2005) Mission concept for autonomous on orbit assembly of a large reflector in space. In: 56th international astronautical congress, Fukuoka, Japan, Paper IAC05-D1.4.03

123

Evol. Intel. (2014) 7:107–118 23. Izzo D, Simo˜es LF, Ma¨rtens M, de Croon GCHE, Heritier A, Yam CH (2013) Search for a grand tour of the jupiter galilean moons. In: Blum C (ed) Genetic and evolutionary computation conference (GECCO 2013). ACM Press, New York, pp 1301–1308. doi:10.1145/2463372.2463524 24. Kennedy J, Eberhart RC, Shi Y (2001) Swarm intelligence. Morgan Kaufmann Series in Evolutionary Computation, Morgan Kaufmann 25. Kernbach S (ed) (2013) Handbook of collective robotics: fundamentals and challenges. Pan Stanford Publishing, Singapore 26. Lipson H (2001) Book review: evolutionary robotics: the biology, intelligence and technology of self-organizing machines by Stefano Nolfi and Dario Floreano. Artif Life 7(4):419–424. doi:10.1162/106454601317297031 27. Miller D, Saenz-Otero A, Wertz J, Chen A, Berkowski G, Brodel C, Carlson S, Carpenter D, Chen S, Cheng S, Feller D, Jackson S, Pitts B, Perez F, Szuminski J, Sell S (2000) SPHERES: A testbed for long duration satellite formation flying in micro-gravity conditions. In: AAS/AIAA Spaceflight Mechanics Meeting, Clearwater, FL, USA, Jan. 23–26, 2000, Advances in the Astronautical Sciences Series, vol 105, American Astronautical Society, pp 167–179, AAS 00–110 28. Nolfi S, Floreano D (2000) Evolutionary robotics: the biology, intelligence, and technology of self-organizing machines intelligent robotics and autonomous agents series. MIT Press, Cambridge, MA, USA 29. Pinciroli C, Birattari M, Tuci E, Dorigo M, del Rey Zapatero M, Vinko T, Izzo D (2008) Self-organizing and scalable shape formation for a swarm of pico satellites. In: NASA/ESA conference on adaptive hardware and systems, AHS’08, IEEE, pp 57–61, doi:10.1109/AHS.2008.41 30. Poli R, Kennedy J, Blackwell T (2007) Particle swarm optimization—an overview. Swarm Intell 1(1):33–57. doi:10.1007/ s11721-007-0002-0 31. Ren W, Beard RW (2004) Decentralized scheme for spacecraft formation flying via the virtual structure approach. J Guid Control Dyn 27(1):73–82. doi:10.2514/1.9287 32. Shoemake K (1992) Uniform random rotations. In: Kirk D (ed) Graphics gems III, the graphics gems series, vol 3. Academic Press, Newyork, pp 124–132 33. Simo˜es LF, Cruz C, Ribeiro RA, Correia L, Seidl T, Ampatzis C, Izzo D (2011) Path planning strategies inspired by swarm behaviour of plant root apexes. Ariadna final report (09/6401), European Space Agency, Advanced Concepts Team. www.esa.int/act 34. Smith BGR, Saaj CM, Allouis E (2010) Evolving legged robots using biologically inspired optimization strategies. In: IEEE international conference on robotics and biomimetics (ROBIO 2010), IEEE, pp 1335–1340, doi:10.1109/ROBIO.2010.5723523 35. Yam CH, Di Lorenzo D, Izzo D (2011) Low-thrust trajectory design as a constrained global optimization problem. Proc Inst Mech Eng Part G J Aerosp Eng 225(11):1243–1251. doi:10.1177/ 0954410011401686