Automatic De nition of Modular Neural Networks Frederic Gruau Mailing adress: Stanford University Psychology Department CA 94305-2130 USA
[email protected]
Other aliations:
Centre d'Etude Nucleaire de Grenoble Ecole Normale Superieure de Lyon Colorado State University Departement de Recherche Fondamentale Laboratoire de l'Informatique Computer Science Department Matiere Condensee, du Parallelisme, 46 Allee d'Italie Fort Collins 17 rue des Martyrs, 38041 Grenoble 69364 Lyon Cedex 07, France CO 80523 USA
January 11, 1995
1
2
Abstract
This paper illustrates an arti cial developmental system that is a computationally ecient technique for the automatic generation of complex Arti cial Neural Networks (ANN). Arti cial developmental system can develop a graph grammar into a modular ANN made of a combination of more simple subnetworks. A genetic algorithm is used to evolve coded grammars that generates ANNs for controlling a six-legged robot locomotion. A mechanism for the automatic de nition of sub-neural networks is incorporated. Using this mechanism, the genetic algorithm can automatically decompose a problem into subproblems, generate a subANN for solving the subproblem, and instantiate copies of this subANN to build a higher level ANN that solves the problem. We report some simulation results showing that the same problem cannot be solved if the mechanism for automatic de nition of sub-networks is suppressed. We support our argumentation with pictures describing the steps of development, how ANN structures are evolved, and how the ANNs compute.
keywords: animats, cellular encoding, modularity, locomotion , auto-
matic de ntion of sub-neural networks.
3
1 Introduction and background An animat is a simulated animal or a real robot whose rules of behavior are inspired by those of animals. It is usually equipped with sensors, with actuators and with a a behavioral control architecture that allows it to control or to respond to variation in the environment. A compact review about animats can be found in [Meyer 94]. The "animat hypothesis" is that intelligent behavior can emerge from the interaction between an agent's internal control mechanisms and its external environment. The highest cognitive abilities of man depend upon the evolution of the simplest cognitive and adaptive behavior of animals [Brook 1991]. The natural behavior of animals is amazingly well adapted to the environment in which they are embedded. To achieve this adaptation, animals are endowed with a nervous system which dynamic is such that when coupled with the dynamic of the environment, these animals can engage in behavior necessary for their survival. Two complementary processes contribute to the synthesis and optimization of an animal nervous system:
Natural evolution takes place over million of years. It contributes a genetic code. After a complex developmental process, this genetic code will be translated into a nervous system.
Learning takes place during life time. It tunes the nervous system to the particular environment encountered by the animal.
1.1 On the function of biological developmental process Nature uses a biological developmental process to transform a genetic code into a nervous system. During the developmental process, cells divide using the genetic information. This allows one to encodes incredibly complex systems with a compact code. For example a human brain contains about 1011 neurons, each one with an average of 105 connections. If the graph of connections was encoded using a list of destinations for each neuron, it would require 1011 105 ln2 1011 = 1:71017 bits. The number of genes is of the order of 2109. The two numbers diers more than 10 orders of magnitude. How can the developmental process achieve such a compression? We conjecture that the mechanism is similar to the one used in modern programming language. Writing a compact computer program in such languages is done by using a hierarchy of procedures. Each procedure is de ned a single time and can be called many times. A procedure corresponds to a sub neural-net, which is encoded a single time on a speci c part of the genetic code. During development, that
4 speci c part can be read by many dierent cells, which will develop many copies of the same sub neural-net. We will refer to this property as modularity. Our conjecture implies a prediction: neural networks encoded in a modular way must have a lot of regularities. We should be able to identify neural structures that are repeated many times. The part of the brain responsible for low level vision contains such general structures [Hubel and Wiesel 1979]. Another example of regularity can be found in the nervous system responsible for the locomotion of the six-legged American cockroach. The architecture discovered by Pearson is described in [Beer 1990]. It is made of 6 similar subnetworks, coupled by inhibitory connections. Each subnetwork controls one leg. The motivation of this work in term of adaptive behavior is to show how a simple computer model of biological developmental process allow to generate regular arti cial neural networks, for controlling animats.
1.2 Arti cial developmental systems An Arti cial Neural Network (ANN) is a graph of simple computing elements called units, which are an abstract model of the biological neuron in the nervous system. The architecture speci es the graph of interconnections, and each connection is weighted. ANNs have been widely used to control animats. The computer equivalent of learning is a method for tuning the ANN's weights using gradient descent. Weights are slightly modi ed over many epochs. Most often, learning is used to optimize the ANN's weights for a xed architecture. Less frequently, evolutionary algorithms have been used to optimize the ANN's architecture [Cli, Harvey and Husband 1993]. In this new emerging eld, there has been recently some attempts to encode ANNs using inspiration from biological developmental. The idea is to indirectly represent an ANN, and to use an evolutionary algorithm to evolve high level representations for computational problem solving or animat simulations. Instead of directly describing a graph data structure (like a list of connections from unit to unit), an arti cial developmental system describes how to build the ANN by applying rules of cell division. Starting with a single cell, an arti cial developmental system develops a graph of cells using repeated applications of the development rules. When the development is nished, the graph of cells can be interpreted as an ANN. A survey of proposed arti cial developmental systems can be found in [Kodjabachian and Meyer 1994]. The system of rules can be modeled as a formal grammar, and the dierent approaches can be classi ed depending on the particular kind of formal grammars involved. I will review matrix grammars,
5 geometrical grammars, parallel string grammars, and graph grammars. Arti cial developmental systems can be traced back to [Mjolness, Sharp and Alpert 1988] and [Kitano 1990] who proposed the rst examples of evolving formal grammars in this context. Nevertheless, although the idea of high level representation based on a grammar can be found in their work, their implementations do not involve cell division because they use matrix grammars. This has some drawbacks ([Gruau 1992]). An m m matrix must be developed for an ANN of n neurons, where m is the smallest power of two bigger than n. In order to get an acyclic graph for a feed forward ANN, one must consider only the upper right triangle of the matrix, which also decreases the eciency of the encoding. Four dierent models recently proposed by [Vario 1993], [Parisi and Nol 1994] [Belew 1993] and [Dellaert and Beer 94] can be classi ed as geometric grammars. The object that undergoes division is a point in a discrete 2-dimensional space (or 3D). This approach is more biologically relevant, because the biological developmental process actually takes place in the 3D space. As a result, realistic eects like the growth of a neuron's axon and dendrite are modeled. Context sensitive rules can be used to model environmental eects like dentrites bouncing against obstacles during their growth. Belew's approach suggests that geometrical grammar can be best represented and implemented by two dimensional (or 3D) cellular automata. The work of [Dellaert and Beer 94] is a particularly clear and simple 2D model of developmental process that allows simulation on a computer. These four approaches are more targeted at modeling biology rather than exploiting arti cial developmental system for engineering. Like Kitano and Mjolness, the problems solved with geometrical grammar can be solved easily by hand1 [Boer and Kuiper 1992] use a parallel context sensitive string grammar known as a L-system, that operate on bracketed expressions. L-systems [Prusinkiewicz and Lindenmayer 1992] were proposed to model the growth of plant and trees. But the natural computer data structure of an ANN is a graph, not a tree. In order to produce a graph, Boer and Kuiper are obliged to make a more complex interpretation of the bracketed expression, using additional symbols. To our knowledge, Boer and Kuiper have not demonstrated the eciency of their coding on a non trivial problem other than the XOR. Sims [Sims 94] proposes a model where a body structure made of segments is developed, and an ANN is encoded with a direct encoding, for each segment. The complete ANN is built by connecting together the ANN of each segment. His animats 1 It is true that the ANN generated in this paper have also be generated by hand, but it is not easy. Beer wrote a book chapter about this, and it took the author 2 monthes to succeed.
6 can walk, swim and jump. The body can always be expressed as a tree structure, and the developmental system seems to be the L-system model. Sims encodes the grammar using a recursive graph. Sims' system seems very eective for generating locomotion behavior. We proposed graph grammars as an ecient way to encode graphs [Gruau 1992]. Our method, called cellular encoding, models cell division, where a cell is just the node of a graph. We have implemented a neural compiler called JaNNeT [Gruau, Ratajszczak and Wiber 1994] that compiles a Pascal program into the cellular code of an ANN that simulates the Pascal program. JaNNeT demonstrates the expressive power of cellular encoding. We have proved several other theoretical properties of cellular encoding [Gruau 1994] which are completeness, compactness, closure, modularity, scalability. For example, about compactness, we proved that cellular encoding was more compact than other methods. If one wants to use arti cial developmental system for problem solving, it is important to prove theoretical properties of the underlying encoding rather than just do computer simulations. This gives a hint as to whether the evolutionary algorithm is going to be successful at exploring the space of codes and whether complex problems can be solved or interesting behavior observed.
1.3 Exploiting Modularity The unique property of a developmental process is modularity. With modularity a sub-ANN can be encoded a single time and copied many times, it results in a very compact representation. The model of developmental process proposed by Sims indirectly generates modular ANNs by developing a modular physical body made of segments that can be repeated recursively. Each segment contains a sub-ANN, and speci es how to connect this sub-ANN to the sub-ANNs of other segments. Since repeated segments contain the same sub-ANN, the resulting architecture is modular. In this paper, we demonstrate modularity using cellular encoding, based on a developmental process. Cellular encoding may be more general than Sim's approach because the ANN needs not be embedded in a physical body. It can generate ANNs for solving an arbitrary problem, not only locomotion. With cellular encoding arti cial developmental system can be used to generate modular ANNs that exploit the regularity of the problem. In [Gruau 1992] and [Gruau and Whitley 1993], using cellular encoding, the evolutionary algorithm was able to generate recursive graph grammars (or cellular codes) that develop
7 families of arbitrarily large ANNs for computing the Boolean functions parity, symmetry, and decoder of arbitrary large number of inputs. To our knowledge, these particular problems have never been solved with any other automatic methods. Cellular encoding allows one to control the recursive development in a precise way, and to stop it after exactly n loops through the rules of the recursive grammar. The ANN that computes a Boolean function with n inputs is made of n copies of the same sub network, (or 2n in the case of the decoder). The cellular code speci es in a homogeneous way the subnetwork and how to put together copies of it. Cellular encoding especially ts those Boolean functions, because it is possible for such functions to de ne regular ANN architectures using simple units with 1 weights and Boolean activities. These functions are an ideal initial benchmark for arti cial developmental system because the optimal architectures are known, and one can make comparison. But general boolean functions need not have such regularities. Given that for a new problem, we do not know how regular it happen to be, we still expect cellular encoding to be ecient, if the problem to solve has a certain amount of regularity and can be decomposed into a hierarchy of sub problems. Most problems have a lot of regularities. In this paper, the method is used to solve a more realistic problem: the genetic synthesis of an ANN for locomotion of a six-legged animat. This problem has a regularity that can be exploited by arti cial developmental system. [Beer 1990] proposed a model of ANN made of six copies of the same subnetwork, each of which controls one leg. The connections between the subnetworks are also regular. Using these symmetries, Beer and Gallagher were able to collapse the genetic code into a 200 bit string of 50 parameters and to perform the genetic synthesis of the ANN. In this paper I solve the same problem, but I do not help the evolutionary algorithm by using my knowledge about the symmetries. Instead, arti cial developmental system makes it possible to automatically nd symmetries. The evolutionary algorithm decomposes the problem into subproblems, generates a subnetwork for solving the subproblem, and produces copies of this subnetwork to build a higher level ANN that solves the problem. The only tness measure we use is the distance the animal is able to move in a xed amount of time.
1.4 Genome splicing We de ne Genome splicing as a general technique which can be used to increase the eciency of an evolutionnary algorithm. The structures that are evolved are sliced into a vector with a xed number of components.
8 During crossover between two vectors, the components are recombined pairwised. Each component may refer to other components many times. In the computation of the evalution function, the same component may be used many times. The genetic search used in this paper involve a particular kind of genome splicing, where each component of the vector encodes a sub-neural networks. The work of John Koza uses another particular kind of genome splicing [Genetic Programming II 1994]. Here, the component of the vector corresponds to a LISP function that can be called many times. I acknowledge that the idea and the technology of genome slicing come from Koza. Genetic Programming has been demonstrated by Koza [Koza 1992] as a way of evolving computer programs with a GA. In the original GP paradigm the individuals in the population are LISP S-expressions which can be depicted graphically as rooted, point-labeled trees with ordered branches. GP, more generally, includes approaches were the language used can be other than LISP, including cellular encoding. Genome splicing is called by Koza, \automatic de nition of function", because each tree encodes an Automatically De ned LISP Function (ADF), and the solution is a hierarchy of functions that call each other many times. We call that \Automatic De nition of Sub Neural Networks" (ADSN), because with cellular encoding each tree encodes a sub neural-network.
1.5 Overview of the paper Section 2 describes the model of the 6-legged animat we use and states the problem we are going to solve using cellular encoding. Section 3 makes precise the particular model of ANN that we are using. In section 4 we describe the cellular encoding. We use a parallel genetic algorithm described in section 5. Next, one of the most important step in using a GA is to de ne the evaluation function. This is done in details in section 6. An interesting possibility is to improve the ANNs generated by the GA by combining a learning with the developmental process. Section 7 describes the learning that we are using. The results of the simulation are reported in section 8 and 9. Cellular encoding allows the user to see the generated ANN. This feature allow us to analyze pictorially the ANN produced by the GA, as well as the search lead by the GA. The analysis is also done is section 8 and 9
2 A non-trivial animat problem We use the model of arti cial six-legged insect described in [Beer 1990]. Each leg controller has 3 motor neurons: The state of the foot is controled by the FS unit. The return stroke is controlled by the RS unit.
9 The power stroke is controled by the PS unit. In order to help coordination of the legs, each leg controller has one sensor unit that records the position of the leg. Each foot has an internal state. A foot can be either up or down. If the activity of the FS neuron is positive, the foot is pulled o the ground, else it is put on the ground. It takes one time step to move the foot. During this time step, the leg cannot exert a force. This takes into account the inertia of the leg and induces a selective pressure towards using the full range of the leg angle. Without this inertia, there would be a trivial solution to the problem, namely to alternate power stroke and return stroke at each time step. The force exerted by the PS and the RS unit are subtracted. If the foot is up, the resulting force is used to update the leg position relative to the body. If the foot is on the ground, the resulting force pushes the animal backward or forward depending on its sign. Due to friction, a dragging leg exerts a constant force pushing backward, proportional to the speed of the robot. There is also a global friction force pushing backward, also proportional to the speed of the robot. At each time step, we sum the forces exerted by the legs which are down, and the friction forces. This sum is used to update the speed of the robot. If the center of mass of the robot lies outside the polygon formed by the feet which are down, the robot falls down and its speed drops to zero. Otherwise the speed is used to update the position of the robot, and the joint angle of the legs which are down. The precise algorithm used to update our animat is described in the appendix. We made one modi cation to Beer's model: we used two discrete sensor neurons used in [Cruse, MullerWilm and Dean 1992]. The two sensor input units are called Anterior Extreme Position (AEP) and Posterior Extreme Position (PEP). AEP 's activity is maximum if the leg is at its anterior extreme position and 0 otherwise. PEP 's activity is maximum if the leg is at its posterior extreme position and 0 otherwise. We got a variety of dierent gait during our multiple runs. The animat would jump (push on all the leg at the same time, then fall, then push again on all the legs, and so on..), or would move one leg at a time. Sometimes, it would produce the tripod gait in which three of the legs alternate with the tree others. We choose all the parameters of the model so that the tripod gait would be the one that enabled the robot to cover the maximum distance. This gait is by far the most dicult to nd over the range of possible regular gaits. It needs the most computer time to be found by the GA. Having xed the parameters in this way, the other gaits were found to be local optima in which the GA could be trapped. We considered a simpli ed model where each foot is automatically controlled. If the RS unit is activated,
10 and the foot is down, the foot is put up. Else, if the foot is up, the leg is pulled back with speed proportional to RS's activity. If the PS unit is activated, and the foot is up the foot is put down. Else if the foot is down, the leg exerts a force on the body proportional to PS's activity. By convention, if both RS and PS are activated, RS \wins" and PS is ignored. We think this simpli ed problem is a good benchmark for ADS, because it is not easy to solve directly by hand, more realistic than Boolean functions, and it has an internal regularity.
3 The continuous, noisy, neural model To solve the problem of locomotion of a six-legged robot the ANN must be able to store an internal state. Hence, it needs recurrent links. The activities of all the neurons are updated at the same time using a continuous time update of the neurons activities, as advocated in [Beer and Gallagher 1993]. i (ai (t + t) ? ai(t))=t = s (netinputi ) ? ai (t)
(1)
The net input is the weighted sum of the neighbor's activities minus the threshold of the neuron, i is a time constant whose value is always 3 in our hand coded neural network. The activities are integers that ranges from ?2048 to +2048. to more precisely represent the sigmoid. We updated the neuron's activities three times before updating the body state. The sigmoid of the neuron is called s. With this model of neuron, for some particular symmetric leg position, our hand-coded solution would get stuck in a wrong attractor. The animat used four legs instead of six. We added random noise in the model. Each time a unit computes its activity, it adds a random number uniformly distributed between ?10 and +10. Noise perturbed the system out of the wrong local attractor. The global attractor is more stable and is not aected by noise. During the rhythmic activity, when the legs switch from RS to PS or vice versa, the ANN's state must come very near to the boarder between the wrong attractor and the right attractor. A little noise is then enough to allow the sytem to move from the wrong attractor into the right attractor. When the network is in the right attractor is never returns to the wrong attractor.
[ Figure 1 about here ]
11
4 Cellular Encoding revisited Cellular encoding is a method for encoding arti cial neural networks (ANN). In this paper we present an updated and improved version of cellular encoding compared to the one in [Gruau 1994b]. Cellular encoding uses a very abstract notion of cells. A cell has an input site and an output site. It is linked to other cells, with directed and ordered links that fan into the cell at the input site and fan out from the cell at the output site. A cell also possesses a list of internal registers that represent a local memory and store labels. The data structure of an ANN is a directed labeled graph. The cell concept is simpli ed to provide only what is needed to describe a directed and labeled graph. The cellular code is based on local graph transformations or graph rewriting rules that act upon cells. There is a growing number of scientists working on graph grammars [Graph Grammar proceedings 1990] who have shown that graph grammars are very powerful at specifying complex objects, compared to more traditional string grammars. Examples of possible graph transformations used in cellular encoding are represented in Figure 1. Picture (a) represents an initial graph of cells. It is composed of one central cell connected to 6 input neighbors and 6 output neighbors. The remaining pictures, (b) to (l), show the eect of dierent graph transformations acting upon the central cell. The graph transformations can be classi ed into cell divisions, local topology transformations, and modi cations of weights.
Picture (b) to (h) represent the eect of dierent cell divisions. A cell division replaces one cell called the mother cell by two cells called child cells. One can imagine many possible cell divisions depending on the way the links of the mother cell are inherited by the child cells. The links are ordered and it is possible to refer to a link by using its number. In picture (a) the number of the links are represented. A sublist of consecutive links is speci ed by the number of the rst link and the number of the last link in the sublist. A particular cell division is implemented by copying one or many sublists of links, from the mother cell to each of the child cell. A cell division must also speci es whether the two child cells will be linked or not. For practical purposes, we give a one-letter name to the graph transformation, and the set of letters will be the set of alleles used with the genetic algorithm. The particular letters we use do not have a particular meaning. Division \S" represented in picture (b) is the sequential division. In the sequential division, the rst child cell inherits the input links, the second child cell inherits the
12 output links and the rst child cell is connected to the second child cell. Division \P" represented in picture (c) is the parallel division. Both child cells inherit both the input and output links from the mother cell. Hence, each link is duplicated. The child cells are not connected. Divisions \S" and \P" are canonical divisions, because they are the most simple: all the other divisions do not handle all the links in a uniform way independently from their position. Division \T" is like \S", except that the input link number one and the output link number one are duplicated. Picture (e) and (f) represent two possible eects of the same cell division \A". Division \G" and \H" are symmetric to one another with respect to the inputs and outputs. They allow a cell to be inserted on the output or the input link number one.
Picture (i) and (j) represent graph transformations that locally transform the topology. The rst one called \R" adds a recurrent link to the cell, the second one called \C" has an argument 3 | it removes link number 3.
The remaining two pictures describe graph transformations that have an argument, and modify the weights. A weight ?1 is represented by a dashed line. The rst is \D", the value of the argument is 3: it sets the input link number 3 to ?1. The second is \K" with argument 2: it sets all the output links starting at 2 to ?1. We use four other program symbols that are not illustrated: The program symbol \I" sets the input weight n to the value +1, where n is the argument. The program symbol \Y" sets the time constant of the neuron to n. The program symbols \U" and \L" set the sigmoids to s256 and s2048 respectively, and the threshold to n. (s ) is a family of piece-wise linear functions that takes values ?=2 at in nite negative value, + at in nite positive value, and crosses the origin. We choose the particular set of graph transformation presented here because this set allows the user to provide a hand-coded solution. Development consists of successive graph transformations on a graph of cells that causes it to grow into an ANN. In order to combine many graph transformations into a cellular code we used an ordered list of labeled trees instead of a set of grammatical rules. An example of cellular code is represented in Figure 2. The trees are labeled by names of graph transformations and are called grammar trees. Each cell carries a duplicate copy of the cellular code (i.e., the set of grammar trees) and has an internal register called reading
13 head that points to a particular position of the grammar tree. At each step of the development, each cell executes the graph transformation pointed to by its reading head. ANN units are cells that have terminated their development and lost their reading-head. The instructions are called program symbols. Program symbols indicating cell division label nodes of arity two; when a cell divides, the rst child cell goes to read the left sub tree, and the second child cell goes to read the right sub tree. Program symbols indicating other graph transformations label nodes of arity one, and once the transformation is executed, the cell will simply move its reading head on the unique sub tree. Cells can also execute instructions for piloting the reading head and instructions to nish the development and produce an ANN unit. Since at each step of the development all the reading heads move one level down in the tree; they come to reach the leaves. When a reading head reads the leaf of a tree, two events may happen.
It can encounters a program symbol such as \U" or \L" in gure 2 which nishes the development. The cell can be considered as an ANN unit with distinct particular features.
It can read a program symbol \n" which has an argument d. If the number of the tree that is currently read is x, the cell moves its reading head on the root of the grammar tree x + d. For example, the program symbol \n 1" moves the reading head on the root of the next grammar tree. The program symbol \n 0" backtracks the reading head to the root of the grammar tree that is currently read, the program symbol \n 2" jumps on the next-to-the-next grammar tree. This is a reference mechanism using relative addresses. During a step of the development, the cells execute their program symbols one after the other. The order in which cells execute program symbols is determined as follows: once a cell has executed its program symbol, it enters a First In First Out (FIFO) queue. The next cell to execute is the head of the FIFO queue. If the cell divides, the child which reads the left subtree enters the FIFO queue rst. This order of execution tries to model what would happen if cells were active in parallel. It ensures that a cell cannot be active twice while another cell has not been active at all. In some cases, the nal con guration of the network depends on the order in which cells execute their corresponding instructions. The waiting program symbol denoted \W" has no eect: it makes the cell wait for its next rewriting step. It is necessary for those cases where
14 the development process must be controlled by generating appropriate delays.
4.1 An example
[ Figure 2 about here ] As an example, consider the problem of nding an ANN for the 6-legged locomotion problem. The input units are sensory inputs that test the position of the legs. The output units commands the legs and the feet. In Figures 3 we show the development of the hand designed cellular code presented in Figure 2. Each cell is represented by a circle. Inside the circle, we write the name (one letter) of the program symbol currently read by the reading head. The development of a neural net starts with a single cell called the ancestor cell connected to an input pointer cell and an output pointer cell. A pointer cell is represented as a square. Consider the picture at the top left of Figure 3. Initially, the reading head of the ancestor cell is located on the root of tree 1, and reads the program symbol \R". Its registers are initialized with default values. For example, its threshold is set to 0. As this cell repeatedly divides, it gives birth to all the other cells that will eventually become units of an ANN and make up the neural network. The input and output pointer cells to which the ancestor is linked do not execute any program symbol. Rather, at the end of the developmental process, the upper input pointer cell is connected to the set of input units, while the lower output pointer cell is connected to the set of output units. These input and output units are created during the development, they are not added independently at the end. After development is complete, the pointer cells can be deleted and are replaced by duplicate input and output units, as shown in the picture in the lower right corner of Figure 3. The hand-coded neural net is made of six oscillators coupled by inhibitory connections. Each oscillator controls one leg. It is inspired by the Pearson model described in [Beer 1990]. Thanks to the modularity, each oscillator is encoded a single time in the tree number 3, Figure 2. The tree number 2 encodes a line of 3 leg controller (half a body). Then, the tree number one puts together two half bodies to produce a complete controller.
15
[ Figure 3 about here ]
5 The parallel Genetic Algorithm We used the parallel Genetic Algorithm described in [Gruau 93]. The basis of a parallel implementation of a GA on a multiprocessor system is to divide the whole population into subpopulations and to allocate one subpopulation per processor. The processors send each other their best solutions. These communications take place with respect to a spatial structure of the population. Dierent models of parallel GAs have been investigated based on dierent spatial structures. We use a new model called \the mixed model", which combines the stepping-stone model and the isolating-by-distance model. Thus the advantages of both models are combined. Local mating allows to achieve a high degree of inbreeding between mates [Collins and Jeerson 1991]. Isolated islands help to maintain genetic diversity [Muehlenbein 1991]. In the mixed parallel GA, individuals are distributed on islands. The islands form a 2-D torus. Each island is mapped on one processor of a MIMD machine. A processor can send individuals only to the four processors that store the four neighbor islands. Inside one island the individuals are arranged on a 2-D grid. The whole spatial structure is a 2D torus of 2-D grids Not all the sites of a 2-D grid are occupied. The density of population is kept around 0:5. It means: a given site is occupied with a probability 0.5. The mating is done as follows: A site s is randomly chosen on the grid. From this site, two successive random walks are performed. The best individual found during these 2 random walks are mated. A veri cation is done to ensure that the two mates are dierent. The ospring is placed on site s. This scheme simulates the isolating-by-distance model sequentially on one processor. The migration rate is parametrized using a single parameter. Our GA is steady state. Each time a new individual is to be created, it can be built either through the recombination of two other individuals, or by mean of an exchange. The exchange is chosen with a probability called the exchange rate which is xed to 0.01. When a processor exchanges an individual, a site s is selected on a border of the 2-D grid. A random walk starts from this site and the best individual found is sent to the processor next to the border. It is sent with the coordinate of s. When it is received, these coordinates are read, and the individual is placed
16 exactly on the site opposite to s. The MIMD parallel machine used is an IPSC860, with 32 processors. One node of this machine is 40 MIPS which is roughly 1.5 SPARC-2. The communications between the processor are asynchronous, and are overlapped by the computations. So the communication cost is null. This theoretical prediction is con rmed by experiments. The subpopulation on each processor is 64 individuals, so the total population is 2048. The set of alleles used was fS; P; T; A; G; H; R;C;D; I; U; L; W;ng. These alleles correspond to program symbols of the cellular encodingas explained in Section 4. Using these alleles, an initial random population is created, in which all the grammar trees have an equal and xed number of nodes. As explained in section 4, some of these program symbols have an argument. For example the argument of 'C' is the number of the link to be cut. Program symbols fC; D; I; U; Lg have arguments. The arguments are set to random values between -9 and +9. The GA is applied until a genetic code is found that develops a neural net meeting the termination criterion explained in section 6, or the allocated time of two hours has elapse. Each time an ospring is created, all the alleles are mutated with a small probability 0:005. An allele is mutated by replacing it with another allele of the same arity.
6 Approaching generalization by distributing 32 tness cases The tness of a given ANN for the problem of the six legged locomotion robot is the average of the distance covered by the robot on a certain sample of initial leg positions. Since the number of input units and output units is also evolved, it might not match the number of inputs and outputs of the problem. If there are too many input or output units, we initialize to 0 the activity of input units having too high a number and we ignore the activity of output units with too high number. If there are not enough, we input only the lower part of the input vector, and lled the upper unde ned components of the output vector with 0. Our aim is not merely to produce an ANN for six legged locomotion, we also want our ANN to be general. Whatever the initial condition of the six legs, we want our robot to perform the tripod gait after a transient of reasonable length. So in all of our experiments, we carefully test the neural networks produced by the Genetic Algorithm (GA) for 64 possible initial positions of the leg, where each of the legs is initially either on Anterior Extrem Position (AEP) or Posterior Extrem Position (PEP). This tests whether the ANN can generalize.
17 One of the most delicate tasks was to devise a good tness that measures robustness, together with a good termination criterion. By \good" we mean a tness that is not too computationally expensive, such that when the tness reaches the level where we assume the solution is acceptable and stop the genetic algorithm, then the solution generalizes over the 64 tness cases. We cannot test all the 64 initial leg positions for each ANN produced by the GA. This would be too time consuming. Instead, we carefully selected a xed set of six dicult initial positions of the leg plus one that is easy. The positions are reported in the appendix. An ANN may be evaluated up to these 7 positions. However, in order to proceed to the next one, it must have been able to walk the threshold distance on the preceding one. Time is saved because very often the evaluation will stop at the rst initial random position. As individuals become better, more time is spent evaluating them. The termination criteria was to generate an ANN having the same performance of the hand coded solution on the seven positions. The threshold distance was chosen as the distance walked by the hand coded solution, multiplied by 0.9. If some individuals get more tness cases than others because they exceed the threshold, they get the sum over all the tness cases they tried and not the average. Thus the rst individual to pass the threshold will quickly take over in the population. This introduces stages in the genetic search, were at each stage, all individuals are able to do a certain number of tness cases. At each new stage, the population will focus its exploration within a more precise region. On one hand, genetic diversity is lost. On the other hand, regions which are believed to be the more interesting are explored more thoroughly. We divided the population into 32 sub-populations, and gave a dierent evaluation function to each sub-population. Individual mate mainly within a sub-population and very occasionally across two neighboring sub-populations. This is easy to implement, because we use a parallel GA that runs on a 32 processor parallel machine, mating across sub-population means exchanging individuals. A given evluation function was built as follows: choose a rst position among the seven as a function of the number of the processor; if the distance walked exceeds a threshold distance, choose another particular initial position also as a function of the number of the processor, and so on, until all the set has been tried. Each sub population chooses initial positions in a particular sequence. depending on the processor number. For a given sub population on a given processor, the sequence stays xed during all the runs. During the rst generation, each sub population concentrates on one of the seven initial positions. Hence, there are seven dierent tness functions. When the ANNs walked passed the threshold distance, the sub-population begins to learn
18 ANNs that can walk starting from two dierent initial positions. At this stage there are 7 6=2 = 21 tness functions. The maximum number of tness functions is: nfitness = (74 ) = 7 6 5=(2 3) = 35
(2)
The processors are on a 2D grid which dimensions are 4X8. No two adjacent processors have the same rst initial leg position among the 7 possible. Hence, adjacent processors will evolve dierent genetic material. When mating across adjacent processors occurs, new genetic material can be created. Using these dierent tness functions, the solution found by the GA passed the generalization test with 100% success, and the average evaluation time of one individual was not signi cantly dierent from the average evaluation time we obtained using a single initial position. Furthermore this multiple tness principles enabled a broader search of the space, and maintained the genetic diversity. Each sub population has a dierent goal in mind, and is therefore led to explore dierent parts of the search space. Intuitively, the power of crossover as a creative process can be exploited by recombining solution from dierent niches.
7 Stochastic Hill-climbing on the weights. We apply a stochastic hill-climber on the weights of each ANN produced by the GA, in order to speed up the genetic search. Deterministic hill-climbing is not possible because it would involve too much computer time, due to the large number of weight change to try. We randomly choose a weight w, and modify it to a random value in f0; 1; ?1g. We then modify the cellular code of that ANN so that the modi ed cellular code produces the same ANN where w is modi ed. We call this back-coding. Many authors [Gruau and Whitley 1993] [Ackley and Littman 1992] have considered such techniques which originates from ideas back to Lamark and Baldwin. Assume we are using Automatic De nition of Subnetwork. The genome is spliced in two trees. The tree number two encodes a subnetwork. The tree number one encodes the general structure of the network and specify how many occurrence of the subnetwork to include and how to connect them. Suppose that the back coding modi es the tree number two. In this case the back coding has an interesting side eect: wherever this sub network will be instantiated in the nal ANN, the weight modi cation will be reproduced. When weight modi cations from two separate parts of the ANN recommend dierent back coding for the same shared component of the genome, a random choice is made.
19 We compute the performance of the ANN developed with the modi ed code. We compare that performance with the performance before hill climbing. If the performance increases, we accept the weight change. If not, we still accept the weight change with a probability e?0:1. We use a single epoch because hill-climbing is expensive. In this context it nearly doubles the time of tness evaluation. There are two possible ways to exploit the information produced by this learning. One is to have learning modi es the tness function and forget the back coded information from one generation to the other. This is called the Baldwin eect. The modi cation of the tness is done implicitly, because the learning method increases the tness. Hinton and Nowlan [Hinton and Nowlan 1987] proposed that the eect of learning could make the tness landscape more easy to climb. The alternative way is to transmit the learned information to the ospring. This mechanism has been proposed by Lamarck. Although it not biologically plausible Lamarckian learning can be used in a computer. Past results [Gruau and Whitley 1993] have shown that the Baldwin eect can speed up the genetic search in a sometime more ecient way than lamarckian learning. In this paper we preferred to use Lamarckian learning because stochastic hill-climbing does not use a problem speci c knowledge, like the computation of a gradient. In this case, hill climbing seldom increases the tness. It is probably more useful to remember a successful weight change when that happens, because it is not likely to be reproduced easily. Back-coding was done using program-symbols \C", \D", \I" de ned in the section 4. However we used the dierent names \M", \N", \F" in order to keep track of the information that has been learn and inherited by the Lamarckian strategy.
8 Simulation with the simpli ed model
8.1 Comparison between runs with and without ADSN
[ Figure 4 about here ] In this section we report simulation and results on the simpler problem of 6-legged locomotion, where the the feet need not be controlled. The experiments are focussed on showing the interest of genome splicing de ned in subsection 1.4. We did two kinds of experiments for a comparative study. In the rst kind, the
20 genome consists of a single grammar-tree and the crossover is done by exchanging sub trees. In the second kind, the genome represented in gure 4 is a vector of three grammar trees. Cross over is done by exchanging sub trees between pairwise component tree. Three subtrees are exchanged for one cross over. Each of the subtree can be only a leaf, or it can be the whole grammar tree. We use the program symbol "n", that has an argument. The argument of "n" is a relative address for moving the reading head from one tree to another tree of strictly higher number. It can be 1 or 2 for tree 1, it is always 1 for the tree 2, it does not exist in tree 3. The argument of `'n" cannot be 0, hence we did not allowed for recursion. The experiment with ADSN and without ADSN had exactly the same parameters, except for one dierence: in the run without ADSN, the initial population consisted of individuals having 60 nodes, and the upper bound on the size of the trees was 600 nodes, and in the run with ADSN, the initial population consisted of individuals having three trees of 20 nodes each, with an upper bound of 100 nodes. In both case, the genetic material (number of nodes) is the same in the initial population. We gave the possibility to grow grammar trees with two times more nodes to the run without ADSN. Since for those runs, genetic material cannot be reused, more genetic material is needed.
[ Table 1 about here ] We ran two trials with ADSN, and two trials without ADSN. The program symbol \Y" which sets the time constant was not used. The time constant i de ned in equation 1 was chosen to be 3 for all the neurons. We used the stochastic hill-climbing method described in section 7. In all the runs, the robot rst learns to generate oscillatory movements of the legs, and thereafter learns to coordinate the movements of the leg to avoid falling. With ADSN the GA found a solution in both of the two trials. Without ADSN, it found a solution only in one run. In all cases, the solution found by the GA was general: the ANN produces tripod gait over all the 64 possible initial conditions of the legs, and the average distance made by the robot is the same as the hand coded neural network. Table 1 summarizes the results of the runs. The speed up produced by ADSN is 6.5 and the run without ADSN uses 3.65 times fewer individuals. These numbers show that on average, the evaluation of one neural network takes twice as much time with ADSN. This is because ADSN
21 tends to generate networks with many units, because subnetworks are repeated many times. On average, networks generated with ADSN are twice as big. We reran the experiment with ADSN and the stochastic hill-climbing disabled. The result are indicated in the fourth row of table 1 and show that learning produces a speed-up of 30%. In Figure 6, we show the genotype with and without ADSN. We can see that the genetic code found without ADSN is more than four times bigger. Two trials are not sucient to provide statistical information. However, there appears to be a real dierence between runs without ADSN and runs with ADSN2 .
8.2 Analysis of the ANNs found by the GA
[ Figure 5 about here ] In gure 5, I compare ANNs found with the GA, with and without ADSN for the simpli ed locomotion problem solved in the previous subsection. Whereas our hand-coded solution encompasses 6 subnetworks, the ANN generated by the GA with ADSN has a subnetwork which is repeated only 3 times. Each subnetwork controls two legs. It contains a neuron with a recurrent connection that we call LR because it also acts like a latch register. For each LR neuron, there is an inhibitory connections to another neuron N, and an inhibitory connection from N to the LR neuron of an adjacent controller. Since ?1 ?1 = 1, the resulting coupling between the two LR neurons is +1. Hence, the three LR neurons in each controller are connected between pairs, by excitatory connections. As a result, the three controllers are in phase. On the other hand, the two legs controlled by one controller are anti-phase. The ANN found by the GA has 12 units, It is much less than the hand coded solution shown in the lower right corner of gure 2. Our solution has 30 units for the simpli ed problem, and 36 units for the complete problem. The GA found the optimal size ANN, if we assume that each leg is controlled by a distinct set of neurons. In this case there needs to be obviously at least 2 units per leg controller. The ANN shown if gure 5 (b) has been evolved without ADSN, and therefore, it has no structure. It is almost impossible to understand how it functions. 2
The main reason why we did not perform more experiments was limited access to the parallel machine.
22
[ Figure 6 about here ] From Figure 6 we can compute the percentage of alleles of the solution that have been learned at some point in the evolution and inherited with the Lamarckian strategy. It is the proportion of alleles \M", \N" and \F" in the trees, since these particular alleles have been used for back-coding. 30% of the alleles of the solution found with ADSN come from learning. Without ADSN, the percentage drops to 22%.
[ Figure 7 about here ] In Figure 7 the foot-steps (b) looks like more ecient than (c) because they use longer strides which is more ecient in terms of the inertia of the legs. However, the x-axis represents time, not distance. One one hand, (b) uses longer strides, on the other hand, it is slower to do the transition from one stride to the other.
8.3 Analysis of a GA run A run of the GA produces 32 les recording the genetic code of the best-so-far individual found by the GA, for each processor. We can afterwards run a program that displays the phenotype (ANN) of the best solutions at dierent stages of the search. Figure 8 reports the sequence for a GA run with the simpli ed model of locomotion with ADSN. It provide useful insights as to how the GA proceeds to builds the ANNs. There are some periods in which the general structure of the architecture remains the same and only a careful scrutiny reveals the few connections that have been changed. These changes are made by mutation and crossover towards the leaves of the trees. The impact of a change in the genotype is big if the change is made early in the development, near the root of the tree; it is small if it is made late in the development, near the leaves of the tree. At other periods, a deep change in the structure can be observed from one best individual to the next. This is due either to crossover near the root of the tree, or to the receipt of a good genetic material from neighboring processors. Sometimes we can see two species of ANNs reappearing alternatively (for example at generation 19 and 21). With ADSN, the impact of crossover or mutation also depends on the tree to which it is applied. The crossover on the rst tree changes the global structure of the ANN, how many copies of the sub network are included and how they are combined. During the run with
23 ADSN, at generation 17 , we see a marked transition toward a general structure which is composed of two oscillators. These structure will remain and be improved eight times before another structure takes over at generation 49. The new structure is made of three oscillators. After two improvements the three oscillators structure leads to a solution that satis es the termination criteria. The two improvement have operated a simple pruning of misleading weights. It is an old dream of the GA community to be able to interactively watch the evolution at works. Our graphics may be a step towards the solution because the human eyes is able to detect repeated structure in graphs, but not in bit strings or labeled trees.
[ Figure 8 about here ]
9 The complete model used by Beer and Gallagher
9.1 Comparison between runs with and without ADSN
In the second set of experiments, we used the model described in [Beer and Gallagher 1993]. Each foot is now controlled by a foot motor unit. The GA was the same as for the simpli ed model except that now the time constant of the neuron is genetically determined, by using the program symbol 'Y' to set the time constant. We run ve trials with and without ADSN using the same GA setting as in the experiments with the simpli ed model. We report here all the trials we have done. We could not do trials to tune parameters, because it was too time consuming. The time limit is two hours and the population size per processor is 64. The GA was successful two times out of ve with ADSN, but could not nd a solution without ADSN. The results are reported in table 2. The two solution found by the GA are represented in gure 9 (a) and (b). Unlike the hand-coded solution, they generalized over the 64 initial leg positions.
[ Table 2 about here ]
24
9.2 Comparison with Beer and Gallagher's results Beer and Gallagher xed the architecture of the ANN. They used one subnetwork for each leg. The subnetworks are similar, the intra-segmental and inter-segmental connections are also similar. Figure 9, in ANN (a) and (b), the subnetworks are similar, but not the connections between subnetworks. Figure 9(c) and (d) shows which subnetwork is connected to which other subnetwork. The gures have been drawn automatically by freezing the development of cells executing the n program symbol. This program symbol includes subnetworks. The resulting graph is called the general architecture. It is encoded by the tree number one in the genome. For ANN (b), the connection from subnetwork 1,2,3,4,5,6 form a cricuit. The general architecture is a perfect circular ring. For ANN (a), The connections go from subnetwork 1 to 2, then to subnetwork 3,4,5,6 in parallel, then back to subnetwork 1. The general architecture is a ring of diameter 3. We did not x the architecture. The GA found that there needed to be 6 subnetworks. The number of units in each subnetwork is 8 in ANN (a) and 12 in ANN (b). The subnetwork of ANN (a) and ANN (b) are represented in gure 9 (e) and (f). They have been developed using the tree number 3 of the genome. The tree number 2 was never used. In Beer and Gallagher, the number of units in each controller is only 5, but they used real weights, and we used binary weights. Furthermore, they have a single sensor, that senses the exact position of the legs, and we used two binary sensors. In our solution, the amount of recurrent connections in the subnetwork can be clearly identi ed. In the subnetwork (f), there are no recurrent connections. In the subnetwork (e), one neuron has a recurrent connection. There are also two pairs of neurons connected to each other. Each of these pair seem to act as a latch register. We can also identify interface neurons, within each subnetwork. Their function is to communicate with the corresponding interface neurons of other subnetworks. The number of connections between subnetworks is smaller than the number of connections inside a subnetwork, because not all the neurons are interface neurons. In the subnetwork (f), there is only one interface neuron that both receives and send activities. In the subnetwork (e) there are 2 input interface neuron, and 2 output interface neurons. In Beer and Gallagher's architecture, all the neurons are input and output interface neurons. The neurons are more coupled inside one leg controller than between leg controllers, but the reason is dierent: Inside one subnetwork there is total interconnections. Neurons within a subnetwork are numbered. A leg controller neuron connects only to the corresponding neuron in each of the leg controller adjacent to it.
25 These assumption were made in order to parametrize the architecture with as few parameters as possible and reduce the search space of the GA. Once more, the point of the paper is that due to ADSN, the GA is able to solve the same problem solved in [Beer and Gallagher 1993], but without providing any information about the symmetries. The search space of the GA is automatically reduced, because of ADSN.
[ Figure 9 about here ] 9.3 Analysis of the ANNs found by the GA A tool has been developed that allows one to animate the activities of the neurons changing while the ANN is relaxing. In this way, and by analyzing the architecture and the subnetwork separately, it is possible to explain the behavior of the ANNs generated by the GA. The computations of ANN (b) are controlled by a ring of 6 interface neurons. Each node of the ring is coupled to the next node with a weight ?13 . The activities ow along this ring, and change signs at each node. Each subnetwork is feed-forward, there are no recurrent connections. Each node in the ring is at the root of this feed-forward subnetwork. It receives input from the AEP and PEP sensors, and directly controls the RS, PS, and FS actuators. The tripod gait simply emerges from the fact that the interface neurons are connected by -1 weights. We had not thought about this solution to the problem. It is a "GA surprise". However, this architecture is not robust, if one node in the ring is suppressed, synchronization between the subnetworks is broken. The mechanics of ANN (a) is intermediate between the one of the ANN found for the simpler problem gure 5 a, and the one of ANN (b). Each subnetwork has a latch register, however, the interconnections between subnetworks also ows in a ring. This ring has three steps: the rst step is subnetwork 1, the second step is subnetwork 2, and the third step is subnetwork 3, 4, 5 and 6 in parallel. The activities ows through all the networks, and not only through the interface neurons. The tripod gait as usual is induced by correctly placed ?1 weights that ensures anti-phase locking between pairs of adjacent subnetwork's latch register. However, not all pair of subnetwork for adjacent legs are connected. This architecture is more 3 The weights are not the nal weights on gure 9 (c) and (d). The nal weights are determined by the genetic code of the subnetworks, and not by the general architecture.
26 robust, but less concise than the preceding one. Three of the subnetworks in f3; 4; 5; 6g may be removed without breaking the ring and the synchronization.
10 Conclusion and a future direction
10.1 Exploiting modularity
Regularity occurs whenever the same pattern is repeated, possibly at dierent scales. It is possible to make a lengthy list of examples of regularity: atoms, molecules, polymers, cells, body limbs, any fractals, the homogeneity of space and time, the cut and paste principles of a word processor, the words of a language, the signs of any code... We believe that regularity is a basic characteristic of information, and no communication or coupling, no any interesting phenomenon can happen without it. Natural evolution has found a way to exploit and produce regularity. The body architecture of the creatures built through natural evolution have some internal regularities that are coupled with the external regularity of this world. For example, the body symmetry maps to the isotropic nature of the space. Similarly, the nervous system of animals may be highly regular. An example is the part of the brain responsible for low level vision. Here, dierent pixels of an image are treated in the same way. The regularity of this network also maps to the homogeneity of space. Nature generates regular animal bodies or nervous system by using a developmental process. A structure genetically encoded a single time can be developed many times. This paper shows that this principles can be turned into an ecient computer technique called arti cial developmental process. This technique can generate automatically Arti cial Neural Networks (ANN) for non-trivial animat control. These ANNs have an internal structure that exploits the regularities inherent in the problem. Our arti cial developmental system scheme uses explicit genome splicing. The genome is spliced into a list of trees which can refer to each other in a hierarchical way, and each subtree encodes a subnetwork, so that the total ANN is a hierarchy of many copies of these smaller subnetworks. We did two kinds of experiments. In the rst kind, the structure of the chromosome is a single tree. In the second kind, the structure is a list of three trees. We solved two problems, the rst problem is a simpli ed model of six-legged locomotion, the second model is a complete model used in [Beer and Gallagher 1993], where a motor neuron controls the foot. With the second experimental setting an ANN solution to the simpli ed six-legged locomotion problem is found 6.5 faster than with the rst setting. Furthermore, the size of the genotype and the number
27 of hidden units of the best phenotypes (ANN) are both more than four times smaller in the second setting. Concerning the complete problem, the rst setting could not even nd solutions out of 5 trials, whereas the rst setting did 2 times. Despite the small number of trials, we believe the dierence between the two settings is suciently high to show the advantage of arti cial developmental system which makes it possible to automatically solve the same problem Beer and Gallagher solved, but without using any information about the symmetries of the problem. This problem is another problem in the list of the non-trivial problems we have solved with cellular encoding. The problems considered up to now always have a certain amount of regularities. Cellular encoding is ecient whenever the problem to solve is regular. Regularity means that the problem is composed of a hierarchy of subproblems that can be concisely expressed in algorithmic form.
10.2 Comparison with previous work The implications of this work for adaptive behavior is generating modular nervous system may be a key element to producing behavior that can exploit the regularity of the environment. The arti cial developmental process called cellular encoding presented in this paper can produce modular structures. This developmental process can be useful to the adaptive behavior community for three reasons. It makes more ecient ANN synthesis. It generates ANNs whose structure can be analyzed and understood. It models nature more closely, since developmental processes are used by nature. There may be also important implications for the development of sensor morphologies, particularly for vision, and for the developments of the sorts of repeated neural structures often observed in nature (e.g. motion detectors). We compare three dierent aspects of the method presented in this paper with other methods. First, this work explores the use of a developmental grammar for the automatic synthesis of ANNs. The space of grammars was search by the GA. Other comparison can be made in a future work with other search techniques, like simulated annealing or random search. Second we can compare our developmental process to similar methods. Up to now, developmental processes have been used only for very simple problems such as XOR. We think it is time to consider more dicult problems, and propose the 6-legged locomotion problem together with a simpli ed version of this problem as two benchmarks. Sims's work is the only exception to solve dicult problem [Sims 1994]. Sims also generates ANN for animat locomotion. His method seems very ecient for this particular problem.
28 On the other hand, it is less general than cellular encoding, since we do not need to embed our ANN in a physical body. With Sims' method, the robot itself has to be built after the evolutionary process, by decoding the genotype. The feasibility of this operation is not obvious. However, in the long term, the potential to generate the robot physical morphology may be very important. With cellular encoding, the robot is built, then the ANN for controlling it is generated using a simulator. Third, we can compare our way of doing genome splicing, with Koza's way [Koza 1994]. The idea of genome splicing is general, and many domains in the evolutionary algorithm community could advantageously leave the \inert" xed length bit string model, and splice the genome in order to give it life and dynamism. All what is needed to do genome splicing is a mechanism allowing the genome parts to reference each other. John Koza has already applied the technique extensively [Koza 94], using trees that are LISP S-expressions, with the Genetic Programming (GP) paradigm. Genome splicing is called by Koza, \Automatic De nition of Function" (ADF), because each tree encodes an Automatically De ned LISP Function, and the solution
is a hierarchy of functions that call each other many times. We call that Automatic De nition of Sub Neural Networks (ADSN), because with cellular encoding each tree encodes a sub neural-network. In [Gruau 1994
c], we compared Lisp and Cellular encoding as two possible ways of splicing genomes. On one hand evolving ANNs instead of LISP expressions is computationally more expensive. It takes much longer to develop and evaluate a neural network than a LISP S-Expression. On the other hand, genome splicing, in cellular encoding, presents some advantages over GP. First with GP one must specify for each ADF, the number of arguments that will be passed. We do not have to do that with cellular encoding. Koza has also evolved the number of arguments. However he still needs to provide an upper bound on the number of arguments. Second, Koza speci es a dierent set of alleles for the main program and the ADF; we do not need to do that either, because the set of alleles used in cellular encoding is homogeneous: cell division and modi cation of weights. Specifying a particular set of alleles for each tree brings a lot of information to the GA. Third because we generate ANNs, we can speed up the genetic search by combining the GA with a learning of the weights. In [Gruau 1994 a] we where able to speed up the genetic search by a factor of up to 13 using a supervised learning method. In this paper, we used a stochastic hill-climbing that saved 30% of the computer time for the simpli ed model. We are sure we can do better with a more elaborate learning of the Hebbian type. Fourth, GP, in that it implies a symbolic search, suers from the symbol grounding problem: how to
29 map symbols to the real world. This is because GP necessarily forces a symbolic pre-characterization of the world. With cellular encoding, the search is conducted at a less abstract level. In the future, cellular encoded ANN with ADSN can be used to generate increasingly complex adaptive behavior for animats. Instead of a single tness function, tnesses can be used for turning, going backwards, wandering, edge following, chemotaxis, consumatory behavior etc... Each tness is associated with a subvector of trees in the genome, and this sub-vector of trees generates an ANN satisfying the tness function. Thus, the GA co-evolves a hierarchy of ANNs, that solves the hierarchy of problems. This way, I hope to be able to have the GA automatically build an ANN able to do the same task as the ANN hand-coded by Beer, to control an arti cial insect in a simulated environment with food and obstacles. It took Beer a complete PhD thesis to produce a clean ANN of 78 units able to do that job. Our aim is to obtain the same results with a few hours of computation on a powerful parallel machine.
11 Acknowledgment This work has been supported by the Centre d'Etudes Nucleaires de Grenoble, the Post Doc. support from NFS grant IRI-9312748 to Darrell Whitely, the European Community within the Working Group ASMICS and the Santa Fe Institue under the adaptive computation program. We thanks Oak Ridge National Laboratory for providing access to their 128 nodes Ipsc860. Bill Macready gave me the idea of having dierent tness functions based on a sharing of the examples to learn, many thanks for that great idea. We are very indebted to Darell Whitley, Eric Siegel, Melanie Mitchell, Rajarshi Das and the anonymous reviwers for their precious comments and corrections and Pierre Peretto, Una-May O'Reilly, and Kenneth DeJong for fruitful discussions about this work.
30
1
1
2
4
3
2
3
5
4
6
5
6
(a) Initial graph
(b) Sequential Division ’S’
(c) Parallel Division ’P’
(d) Division ’T’
(e) Division ’A’
(f) Division ’A’
(g) Division ’H’
(h) Division ’G’
(i) Recurrent Link ’R’
(j) Cut input link ’C 3’
(k) Change Input Weight ’D 3’ (l) Change Output Weights ’K 2’
Figure 1: Example of local graph transformations used in the hand coded solutions.
31
tree 1
K2
A
R T n1
tree 3
tree 2
n1
n1
G
A
W n1
P
n1 L
H D1
R
L
U
P L
P D1
D1
L
L
Figure 2: An example of cellular code designed by hand. The tree 3 encodes a leg controller. The tree 2 de nes a row of three leg controllers (half of the body). The tree 1 de nes the complete neural network.
32
u T
R
u
step 0
step 1
step 2
A
A
step 3
A
u
u
u
W
A
u
u
u
step 4
step 5
K
K
K
G
G
G
K
K
K
G
G
G
step 6
step 7
L
L
D
L
D R
R
P
P
P
L
D
L
D
L
R
R
P
P
P
L
Y L
L
P
Y L
L
L
L
L
L
L
L
L
L
L
L
L
P
H
H
P
P
P
H
H
H
L
D
Y P
D
D
D
D
U D
D
D
L
Y P
U
U P
L
L
L
step 9
P
H
L Y
Y P
D
R
P
step 8
D
R
L
step 12
W
L
U D
P
U
step 10
step 11
step 13
Lay out of the input and output units
Figure 3: Steps of development of a hand-made cellular code
U D
D
D
33
1
2
3
1
2
3
1
2
3
1
2
3
(a)
(b)
Figure 4: The genome is spliced in three trees. (a) before the crossover of two genomes (b) after the crossover.
34
AEP1 PEP1 AEP2 PEP2 AEP3 PEP3 AEP4 PEP4 AEP5 PEP5 AEP6 PEP6
PS1 RS1 PS2 RS2 PS3 RS3 PS4 RS4 PS5 RS5 PS6 RS6
(b) (a)
Figure 5: ANNs found by the GA for the simpli ed six legged locomotion problem: (a) with ADSN (b) without ADSN
35
(a) single tree: A(A(U-3)(S(L3(M3(M9)))(P(T(L4(M1))(D-6(G(D2(P(L-8(F6(F1)))(S(L8(D-1(M3)))(I-5(M1 (F1))))))(L4(M3(M9))))))(A(U7(M2))(A(H(L-8(M1(L2(M3))))(S(W(M1))(U-5)))(C-3(T(L1(M1))(U-5))))))))(R(T(S(L-2(M1(L4(L))))(T(D-8(W(F2)))(T(T(S(R(A(A(S(A(L-1)(L8)) (R(L3(M3))))(S(G(L-9(M1))(D7(M3(F1))))(L6)))(W(S(A(S(D-5(T(I9(U-1))(I-7(L(F1)))) )(S(L-6(M2))(U2())))(R(L2(M1))))(T(L3(L-5))(L-7(M1)))))))(F1))(L-3(M2)))(L-3)))) (H(I8(P(S(H(P(L-2(M2(M1)))(R(U9(M1))))(S(U4)(U5(U-2(M1)))))(A(M1)(T(L5)(I-5(U-1) ))))(A(T(T(C(A(P(G(C(T(C-4(A(P(G(C6(L4(U-5(M1))))(W(U-8)))(I3(L-2)))(U-7)))(F1)) )(M1(L-6(L(M3)))))(U6))(U9)))(G(T(P(D1(L2))(M1))(T(D-2(L-6(M1)))(S(L8)(U-4(M1))) ))(R(T(M1)(C-3(L8))))))(W(W(T(A(A(S(A(L-3)(L-7))(R(L1(M3))))(S(G(L(M1))(D7(M3(F1 ))))(L6)))(W(S(A(S(D(T(I-9(U-1))(I-7(L-3(F1)))))(S(L-6(M1))(U2(S(G(L(M1))(D7(M3( F1))))(L6)))))(R(L2(M1))))(T(L3(L5))(L7(M1))))))(L-9(M1(M3)))))))(G(H(G(L2(M1(C9 (F1))))(D-2(L-1)))(D-1(D7(S(T(I(L6))(H(L8(L-2(M1(L-4(L-5)))))(L-5)))(U-9)))))(U2 )))))(G(U6)(M2))))) (b) tree 1: A(A(n2)(n2))(n2) tree 2: not used tree 3: P(T(C-7(C-7(D5(M3(M2(C-7(C-7(D5(M3(M2(C-9(I2(U-5(C2(I8(U1(M2(F2(M2(I-1(F2(M4(U)) )))))))))))))))))))))(D5(D5(M3(M2(I2(M2(I4(F4(F3))))))))))(T(C-7(C-7(C-7(D5(M3(M 2(I7(W(F2(M3(M2(I4(F2(F4))))))))))))))(R(U6(M2))))
Figure 6: Comparative size of the genome found by the GA for the simpli ed problem. (a): without Automatic De nition of Sub Network and (b): with ADSN. Without ADSN, the number of tree nodes inherited by the Lamarckian back coding is 57, and the total number of nodes is 260. With ADSN, the number of learned nodes is 19, and the total number of nodes is 65.
36
(a)
(b)
(c)
Figure 7: Illustration of foot steps for the 6 legged locomotion without foot motor neurons. Foot steps are represented in the following way: whenever a leg is up, we plot a dot; otherwise nothing is plotted. The lines corresponding to the six legs are plotted one under the other, in the following order: left posterior, left middle, left anterior, right posterior, right middle, right anterior. Clearly the tripod gait is illustrated. (a) foot steps produced by the hand coded ANN (b) foot steps produced by the ANN found by the GA without ADSN. (c) two types of foot-steps produced by the ANN generated with ADSN.
37
without ADSN with ADSN relative gain ADSN, hill-climbing disabled
number of evaluations time 18805 12752 5152 1968 3,65 6.5 10724 2589
Table 1: Comparison between runs with and without ADSN for the simpli ed problem
38
without ADSN with ADSN
number of evaluations time 16234 6480
Table 2: Comparison between runs with and without ADSN for the complete problem
39
leg 1 AEP AEP PEP PEP PEP AEP PEP
leg 2 AEP AEP PEP PEP PEP PEP PEP
leg 3 AEP AEP PEP PEP PEP AEP PEP
leg 4 AEP AEP PEP PEP AEP PEP PEP
leg 5 AEP AEP PEP PEP AEP AEP AEP
leg 6 AEP PEP PEP AEP AEP PEP AEP
Table 3: The learning set of seven initial positions of the legs
40
generation 1
generation 3
generation 4
generation 5
generation 6
generation 7
generation 8
generation 10
generation 12
generation 17
generation 19
generation 20
generation 21
generation 30
generation 31
generation 36
generation 47
generation 49
generation 56
generation 57
Figure 8: Evolution of the best individual, with automatic de nition of subnetworks, simpli ed model of locomotion
41
Neural Network solution (a)
Neural Network solution (b) 2 3
2 1
1
4 5
3
(c) Architecture of ANN (a)
4
5
6
6 (d) Architecture of ANN (b)
interface neuron
(e) Subnetwork of ANN (a)
(f) Subnetwork of ANN (b)
Figure 9: (a) and (b): ANNs found by the GA for the complete problem. (c) and (d): General architecture of respectively ANN (a) and (b). The number indicates the leg controlled by the subnetwork which will be included at that particular position. (e) and (f): Subnetwork of ANN (a) and (b). The interface neuron are those which will make connections to other subnetworks.
42
References [1] International Conference on Graph Grammars. Lecture note in computer Science 532, 1990. [2] D. Ackley and M. Littman. The interaction between learning and evolution. In Arti cial life II, 1991. [3] Randall Beer. Intelligence as adaptive behavior. Academic Press, 1990. [4] Randall Beer and John Gallagher. Evolving dynamical neural networks for adaptive behavior. Adaptive Behavior, 1:92{122, 1992.
[5] R. Belew. Interposing an ontogenic model between genetic algorithms and neural networks. In Neural Information Processing System 5th, 1993.
[6] E.J.W. Boers and H. Kuiper. Biological Metaphor and the design of modular arti cial neural networks. Master Thesis, Leiden University, the Netherlands, 1992. [7] Rodney Brooks. Intelligence without representation. Arti cial Intelligence, 47:139{159, 1991. [8] Dave Cli, Inmahn Harvey, and Phil Hubands. Explorations in evolutionary robotics. Adaptive Behavior, 2:73{110, 1993. [9] R. Collins and D. Jeerson. Selection in massively parallel genetic algorithm. In Proc. of 4th International Conf. on Genetic Algorithms, 1991.
[10] F. Dellaert and D. Beer. Co-evolving body and brain in autonomous agent using a developmental model. Ces94-16, Case Western University, 1994. [11] F. Gruau. Genetic synthesis of boolean neural networks with a cell rewriting developmental process. In Combination of Genetic Algorithms and Neural Networks, 1992.
[12] F. Gruau. The mixed parallel genetic algorithm. In Parallel Computing 93, 1993. [13] F. Gruau. Neural Network Synthesis using Cellular Encoding and the Genetic Algorithm. PhD Thesis, Ecole Normale Superieure de Lyon, 1994. anonymous ftp: lip.ens-lyon.fr (140.77.1.11) directory pub/Rapports/PhD le PhD94-01-E.ps.Z (english) PhD94-01-F.ps.Z (french).
43 [14] F. Gruau, J. Ratajszczak, and G. Wiber. A neural compiler. Theoretical Computer Science, 1994. to appear. [15] F. Gruau and D. Whitley. Adding learning to the the cellular developmental process: a comparative study. Evolutionary Computation V1N3, 1993. [16] U. Muller-Wilm H. Cruse and J. Dean. Arti cial neural nets for controlling a 6-legged walking system. In Second International Conference on Simulation of Adaptive Behavior, 1993. [17] G.E. Hinton and S.J. Nowlan. How learning can guide evolution. Complex Systems, 4:495{502, 1987. [18] D. Hubel and T. Wiesel. Brain mechanisms of vision. Scienti c American, 1979. [19] J.Kodjabachian and J. Meyer. Development, learning and evolution in animates. In PerAc'94. IEEE computer society press, 1994. anonymous ftp at ftp.ens.fr, pub/reports/biologie/PerAc94.ps.Z. [20] H. Kitano. Designing neural network using genetic algorithm with graph generation system. Complex Systems, 4:461{476, 1992.
[21] John R. Koza. Genetic programming: A paradigm for genetically breeding computer population of computer programs to solve problems. MIT press, 1992.
[22] John R. Koza. Genetic programming II: Automatic Discovery of reusable programs. MIT press, 1994. [23] J.A. Meyer and A. Guillot. Simulation of adaptive behavior in animats: Review and prospects. In Proc. of the 1st inter. cont. on Simulation of Adaptive Behavior., 1991.
[24] Eric Mjolness, David Sharp, and Bradley Alpert. Scaling, machine learning and genetic neural nets. La-ur-88-142, Los Alamos National Laboratory, 1988. [25] H. Muehlenbein, M. Schomish, and J.Born. The parallel genetic algorithm as function optimizer. Parallel Computing, 17:619{632, 1991.
[26] Parisi and Nol . Morphogenesis of neural networks. Technical report, University of Rome, 1992. [27] P. Prusinkiewicz and A. Lindenmmayer. The algorithm beauty of plants. 1992.
44 [28] Karl Sims. Evolving 3d morphology and behavior by competition. In R. Brook and P. Maes, editors, 4th Intern. Conf. on Arti cial Life, MIT Ress, 1994.
[29] J. Vaario. An emergent modeling method for arti cial neural networks. PhD Thesis, University of Tokyo, 1993.
Appendix In this appendix we make precise the parameters of the animat. The algorithm for computing the force exerted on each leg, and the angle modi cation of each leg at each time step is: force := ( PS - RS ) / 256 if (foot-on-the-ground) if(angle>0) angle := angle - speed *0.33. else force := -1 * speed else angle := angle - 20 * force if(angle > 30) angle := 30 if(angle < 0 ) angle :=0
The force coming from each leg are summed plus a global friction equal to 0.3 * speed. then the speed and position of the animal are updated according to distance := distance + 0.33 * speed speed := speed + 0.33 * force
If the animal is not balanced, the speed is set to 0. The seven initial leg positions used in the learning set are speci ed in table 3. The number of time steps on each of this position is 50. The target distance that the animat must walk for each leg position is 85. The threshold distance is 76.
[ Table 3 about here ]