"Step by step" evolution of cognitive function: the ... - Semantic Scholar

1 downloads 0 Views 233KB Size Report
of nite food resources in individual niches. This allowed the ... Recent work in "Arti cial Life" has opened new perspectives on these issues. This paper reports a ...
"Step by step" evolution of cognitive function: the Composite Learning System Model Richard Walker1 and Henrik Hautop Lund2 Via XXIV Maggio 130, 00046 Grottaferrata (RM), Italy Institute of Psychology, National Research Council, Viale Marx 15, 00137 Rome, Italy E-mail: [email protected] [email protected] 1

2

Abstract. While biological evolution and learning are stepwise processes,

classical Arti cial Neural Networks (ANNs) are designed and trained "in one go". The authors suggests that arti cial organisms whose neural architecture emerges via Darwinian evolution and which are trained "step by step" may be able to achieve richer functionality than is possible for conventional ANNs. The paper presents the Composite Learning System Model (CLSM): a model of cognitive evolution designed to validate this hypothesis. In the CLSM a Genetic Algorithm (GA) operates on a population of arti cial organisms represented as ANNs. The GA produces "organisms" with the ability to learn sets of functions, representing the ecological niches in which individual organisms operate. The de nition of these functions allows simple functions to act as building blocks - or pre-adaptations for the computation of more complex ones. The paper presents a series of implementations of the model along with simulation results. Initial simulations showed that CLSM networks were able to acquire simple function sets but incapable of acquiring more complex functions pre-de ned by the experimenter. This led to the introduction of "Variable Characteristic Functions": the functions to be acquired were themselves subjected to variation and selection. In this modi ed version of the model the population was rapidly invaded by a single lineage of organisms computing a set of very simple functions. In order to force the learning of more complex functions the model was revised, to take account of nite food resources in individual niches. This allowed the evolution of a higher degree of functional complexity. The paper discusses the signi cance of these results. It is conjectured that that learning time for CLSM organisms may be less than linearly proportional to problem complexity and that they may thus avoid the scaling problems of classical ANNs; the use of variable Characteristic Functions is shown to have advantages in the avoidance of local optima; a mechanism is suggested for the accumulation of cerebral complexity in the absence of corresponding tness gains; the model is used to show the advantages of a highly modular brain architecture as well as a possible mechanism for xed learning sequences as observed in human and animal developmental psychology. At the end of the paper brief indications are given concerning directions for future research.

1 Introduction Over the last few years Arti cial Neural Network (ANN) models have been successfully used to simulate potential neural mechanisms for a number of interesting cognitive and behavioural tasks. Current models have tended however to emphasis architectures and learning algorithms whose scaling properties are poor and whose biological realism is questionable. ANN theory has had little to say concerning the large scale functioning of animal nervous systems and the way in which these nervous systems evolved. Recent work in "Arti cial Life" has opened new perspectives on these issues. This paper reports a new model of cognitive evolution based on ideas from this work. For

reasons which will become apparent the model will be referred to as the Composite Learning System Model (CLSM). The starting point for the development of the CLSM was the observation that evolution and learning are "step by step" processes. Evolution constructs complex organs and organisms by stages; in learning, the ability to perform complex cognitive tasks emerges gradually once simpler tasks have already been mastered (see e.g. [15]). This step by step approach is not re ected in classical ANN research. Traditional ANNs do not "evolve"; they are designed and trained "in one go". The experimenter de nes a task, an architecture, and a learning procedure. The ANN is then "trained" and "tested" in the time necessary for a computer to simulate a certain number of "learning cycles". This approach ignores Alan Turing's warning that training an "intelligent" computer is likely to involve the same diculties as training a human infant [18]. The observed contrast between biological reality and ANN practice led to the hypothesis that arti cial organisms whose neural architecture emerged through Darwinian evolution and which were trained "step by step" might be able to acquire richer functionality than is possible for conventional ANNs. The long-term aim of the CLSM , yet to be achieved, is to validate the hypothesis. This paper describes the current state of the investigation. Section 2 summarises background concepts and issues; section 3 describes the model; section 4 presents a practical implementation and simulation data. Finally section 5 will discuss the signi cance of the results obtained so far, indicating possible directions for future research.

2 Background concepts and issues 2.1 Arti cial Neural Networks

Ever since the pioneering work of McCullough and Pitts [10], researchers from different disciplines have attempted to build mathematical models of speci c cognitive or control functions performed by animal nervous systems. Much modern research has concentrated on Arti cial Neural Networks (ANNs): networks of simple, interconnected, computational units reproducing, with varying degrees of realism, the functioning of small groups of neurons within the Central Nervous System. It has been shown that ANNs have the ability to classify, memorize and retrieve data ("unsupervised learning"). ANNs have been "taught" particular behavioural or cognitive abilities using feedback from an arti cial "environment" ("supervised learning"). A vast range of apparently complex cognitive and behavioural functions, from character [5] and voice recognition [7] to complex optimization [3] have been successfully simulated using networks of no more than a few hundred or thousand units. Although the ability of ANNs to mimic the behaviour of real-life learning systems is often impressive existing models leave a number of key questions unanswered: 1) Current generation ANNs are orders of magnitude smaller than the neural networks known to exist in even the smallest of the higher animals. There is currently no practical way of "training" large ANNs. Recent theoretical work suggests that at least for some architectures the training problem is NP-complete [6]. If this is so there are fundamental limitations on the size of ANN which can be trained. 2) The majority of current ANN research is based on networks with "hidden neurons" (neurons with no direct connection to the output units). This leads to the so-called "credit-assignment" problem: i.e. the problem of how to measure the contribution of "hidden neurons" to the "error" of the system and thus to compute corrections to synaptic strengths. Although algorithms such as Back-Propagation [16] or the Boltzmann machine [1] provide solutions for certain types of network, these algorithms are not generally considered to be biologically realistic.

3) Models allowing the creation of new neurons or new synaptic connections have been introduced only recently. In all previous work, the underlying architecture of the network is a "given". This implies there is no way in which a network can increase in complexity. Until recently, evolution of ANN topology has been outside the ambit of mainstream ANN theory.

2.2 Genetic Algorithms In recent years classical ANN theory has been enriched by the use of so-called Genetic Algorithms (GAs) mimicking the working of biological evolution. GAs are a key element in recent studies of "Arti cial Life". In a GA, populations of similar but not identical arti cial organisms optimise a " tness function" via a Darwinian selection process. Individuals in the population are sorted by their ability to compute a particular function. Less t organisms are eliminated. The " ttest" organisms are allowed to "reproduce". During the reproduction process "mutations" are introduced ensuring that "child" organisms di er, in minor respects, from their "parents". This process is iterated for many generations until optimisation has been achieved. As has been realized by many researchers, GAs provide a potentially powerful tool to investigate the evolution of neural architecture. If the "genome" codes for the development of a particular architecture modi cations in the genome are re ected in modi cations in architecture. There are however signi cant problems in developing realistic coding schemes for the genome. As a result the only mutations allowed in the majority of GA-based ANNs are changes in the logical function computed by speci c neurons [14] or in the value of speci c synaptic connections [13]. Only recently, models allowing the creation of new neurons or new synaptic connections have been introduced. Where such changes in network topology is not possible there is no opportunity to increase or decrease overall architectural complexity, an obvious pre-condition for the evolution of novel cognitive function.

2.3 Evolvable " tness functions" Recent work by Lund and Parisi [8] criticises many GA-based models of Arti cial Life. Lund and Parisi point out that in virtually all work to date the " tness function" is pre-de ned by the experimenter. This, they argue, is an unrealistic model of the working of biological evolution. In real-world environments the " tness function" is an inheritable characteristic of the organism, subject to the same selective pressures as those applying to other characteristics of the individual organism. It follows that, within certain constraints, real-world organisms "choose" their own tness functions. If Lund and Parisi are correct any attempt to model the evolution of intelligence in terms of a monotonic trend towards a prede ned goal is unrealistic. Later sections of this paper will propose a model of evolution in which organisms "choose" the ecological niche to which they adapt. It will be argued that this "freedom of choice" is a fundamental aspect of biological evolution. Including this ability in the CLSM was a key step in the development of the model.

3 The model The CLSM models the acquisition of new cognitive function as a two level process involving both the Darwinian evolution of new neural architectures and "learning" by individual organisms. The model posits an initial population of arti cial organisms ("Composite Learning Systems") each one of which consumes resources from a set of "environmental niches". The abilility of an organism to use resources from a particular niche is measured by its ability to accurately compute a speci c input-output function associated with the niche. This function is referred to as the Characteristic Function. During its "lifespan" the Composite System is "trained" to compute its Characteristic Functions as accurately as possible. The particular architecture of the organisms (see below) guarantees that learning proceeds "step by step". Initial "building blocks" are used to construct more complex "composite functions" - hence the name of the model. An organism's tness is determined by the accuracy with which Characteristic Functions are computed and by the availability of resources within individual ecological niches. The ttest individuals in the population reproduce; the least t are eliminated. During reproduction the architecture of organisms and their Characteristic Functions, are subject to mutation and selection. By changing Characteristic Function a particular lineage may shift from one "niche" to a new one. Over many generations the process of selection/reproduction/variation leads to changes in the architecture, cognitive abilities and learning dynamics of the lineages of organism constituting the population. These changes constitute the main interest of the model.

3.1 Composite Learning Systems A Composite Learning System (CLS) is de ned by a graph consisting of:

{ A set of nodes ("neurons") subsets of which are de ned to be "input" and "output" neurons. All inputs and outputs are binary I; O 2 f0,1g. { A set of edges ("connections") each of which provides a unidirectional channels

for ow of information from an "origin neuron" to a "destination neuron". Any neuron (including output neurons) can be an origin neuron. Input neurons cannot however be destination neurons. Each edge is associated with a real-valued weight wij 2 f-1,1g.

{ A set of Characteristic Functions each one of which maps the state of the input neurons to the "desirable" state of one of the output neurons. The Characteristic Functions may be interpreted as representing ideal behaviours for the set of "ecological niches" in which the Composite Learning System operates.

3.2 Computation Computation in a Composite Learning System is based on "spreading activation". Each "destination neuron" simultaneously computes a linear threshold function of the weighted output from its "origin neurons". This computation is iterated for a predetermined number of cycles. At the end of this process the collective state of the output neurons is taken to be the "output" of the system.

It is an essential feature of the model that an output neuron can also be an "origin neuron" (see Fig. 1). This implies that an output neuron 1 which performs a speci c function can provide input to a second neuron (output neuron 2) which computes a second function. If the rst neuron computes fi and the second function computes fj then the composition of the two functions computes the composite function fi  fj . The ability to compute composite functions allows the organism to compute functions which would not be possible for a "single layer" system.

Fig.1. Output Neuron 1 provides input for Output Neuron 2.

3.3 Characteristic Functions

Unlike traditional GAs CLSs do not optimize the ability to compute a single function but rather a set of functions referred to as the "Characteristic Functions" of the organism. Each output neuron is associated with a single Characteristic Function. The accuracy with which an organism computes a particular Characteristic Function may be interpreted as its ability to occupy a particular ecological niche. Characteristic Functions are modelled by Boolean Functions in n-variables. The functions used are enumerated using a variant of a scheme developed by Wolfram [19] 3 . The choice of the Boolean Functions was made for two main reasons. As is well known some of the Boolean functions are easier to compute than others. Very simple functions (e.g. tautology) can be computed by a single neuron with no connections; the so-called linearly separable functions can all be computed by a single layer network with a maximum of n connections. The linearly unseparable functions - which for n ! 1 represent the overwhelming majority of the ensemble, require larger networks. Given that each Boolean function is associated with a particular Characteristic Function it follows that the Characteristic Functions available to organisms 3

Example: Function number 8 is the 2-variable AND. The function number is derived as follows. Result Binary Decimal v1 v2 (v1 AND v2) coding coding 0 0 0 0 1 0 1000 8 1 0 0 1 1 1

- the ecological niches they can occupy - include functions of mixed levels of diculty. In this way the Boolean function family models an essential characteristic of real life environments. A second fundamental characteristic of the Boolean functions is that simple functions can be composed to form more complex functions. It is possible to write for example XOR(i1 ; i2) = AND[OR(i1 ; i2); NAND(i1 ; i2 )] (1) It follows that if a network has successfully acquired the OR and NAND functions (simple linearly separable functions) the acquisition of the XOR (a non-separable function) requires nothing more than the computation of an additional AND. OR and NAND are pre-adaptations for the computation of the XOR. The environment provided to CLSM organisms is deliberately designed to allow for stage by stage acquisition of function. This is a key feature of the model which distinguishes it from "one shot" ANNs and GAs.

3.4 Learning The objective of CLS learning is to optimize the computation of an organism's Characteristic Functions. This is achieved via the use of a classical Perceptron Convergence Procedure [11] applied to organisms' output neurons. There is no training of "hidden neurons". Given however that one output neuron may be connected to another output neuron CLSM organisms have the ability to acquire composite functions which are not available to classic perceptrons. Although there are no "genetic switches" controlling the sequence of learning the topology of the network guarantees that function acquisition proceeds "stepwise". If the computation of the Characteristic Function of neuronj requires input from neuroni the learning algorithm will be unable to optimize the performance of neuronj until neuroni has completed its learning. Learning by neuroni is in other words a prerequisite for learning by neuronj .

3.5 Resource consumption Each Characteristic Function represents the ability of an organism to operate in a particular environmental niche. Each environmental niche is associated with a certain quantity of food resources. Every time a CLS correctly computes a Characteristic Function it consumes a unit of food resources with (2) pconsume = rrt 0

where pconsume is the probability of consumption rt is the current quantity of available resources r0 is the initial quantity of available resources

3.6 Fitness The tness of an organism is given by the the number of units of food consumed over its lifetime. It is important to note that food consumption begins "at birth". The model therefore assigns a selective advantage to organisms which can consume e ectively at an early stage in their life cycle, penalizing organisms in proportion to the percentage of their life cycle used to "learn" their Characteristic Functions.

A normalized tness index, allowing comparison between organisms occupying different numbers of ecological niches (i.e. with di erent numbers of output neurons) or with di ering lifespans, is given by: fitnorm = nOutputs fitnLifecycles (3)

3.7 Selective reproduction

The probability of reproduction is a rising function of tness. In the simulations each Organism in the ttest third of the population was cloned two times, each organism in the second third of the population was cloned once and organisms in the bottom third of the population did not reproduce. Each organism emerging from the cloning process is subjected to variation during reproduction. The CLSM does not attempt to model sexual reproduction. The probability of any given category of variation is a parameter of the simulation. The model allowed the following "mutations": addition of a neuron, deletion of a neuron, addition of a connection, deletion of a connection, substitution of a Characteristic Function with a di erent function (in some simulations this possibility is switched o ).

4 Simulation results 4.1 Simulation 1: Acquisition two-variable Boolean functions with no mutations in the Characteristic Function and in nite food resources

The population consisted of 99 organisms each of which had 2 input units and 8 output units. For implementation reasons the maximum number of neurons in each organism was limited to 25. All Composite Organisms were assigned the same set of Characteristic Functions, namely the rst eight 2-variable Boolean Functions Food resources were assumed to be in nite. "Mutation" probabilities were assigned as follows: Create Neuron 0.2 Destroy Neuron 0.2 Create Connection 0.1 Destroy Connection 0.1 Modify Characteristic Function 0.0 The evolutionary dynamic was studied using the " ttest" organism in each generation. As can be seen in Fig. 2 the evolutionary process led to an asymptotic increase in tness. By Generation 48 the ttest organisms computed all their characteristic functions with perfect or close to perfect accuracy. This implies that the organisms had successfully learnt not only the linearly separable functions which a Perceptron can learn in a single stage but also function 6 (the Boolean XOR) which can only be learnt in two stages. A study of the performance of individual output neurons shows that the acquisition of function was, as predicted, a step-wise process. Figure 3 shows, that the performance gets high very quickly on the rst (easy) functions, but that this takes much longer for more complex functions. The order in which functions were acquired was directly correlated with their theoretical complexity as measured by the number of connections in the smallest networks theoretically capable of computing the function (see Table 1). In parallel with the increase in tness there was an increase in the number of

Fig. 2. Increase in Fitness over 48 Generations ( ttest organism in population).

Fig. 3. Performance by Function. Table 1. Order of function acquisition (Simulation 1). Order Function Theoretical Complexity 1 0 0 2 3 2 3 5 2 4 2 3 5 1 3 6 4 3 7 7 3 8 6 5

connections in the system (see Fig. 4) . For most of the period the increase in the number of connections was roughly linear with respect to time (r2=0.86). The number of neurons increased much more slowly.

Fig. 4. Increase in circuit complexity. It was observed that a considerable proportion of the circuitry developed by Composite Systems within the population was super uous to requirements. Figure 5 shows actual circuitry for Function 0 (the circuit computes an output of 0 for all possible inputs) in a " t" CLS. It is easy to see that the function computed by this circuit can be implemented much more simply by a single neuron with a negative valued threshold and no input connections.

Fig. 5. Actual circuitry for output 0 after 40 generations of evolution (threshold values are shown next to output neurons).

4.2 Simulation 2: Acquisition of Boolean functions in three variables with no mutations in the Characteristic Function and in nite food resources

The population consisted of 99 CLS each of which had 3 input units and 5 output units. For implementation reasons the maximum number of neurons in an organism was limited to 16. All Composite Organisms were assigned the same Characteristic Functions (a set of 5, randomly chosen Boolean Functions in three variables). The simulation was repeated several times with di erent functions sets. Mutation probabilities were maintained as in the previous simulation. Food resources were again

assumed to be in nite. These simulations (data omitted) systematically failed to achieve results comparable to those obtained for functions in two variables. As a rule tness rose very slowly to a plateau between 0.7 and 0.8. Even after more than 200 generations it proved impossible to obtain higher tness. Only in rare cases was it possible to obtain fully accurate computation for more than two out of the ve functions speci ed in the Characteristic Function. The failure to obtain satisfactory results in this simulation suggested that the CSLM was capable of acquiring only a small proportion of all possible Characteristic Functions sets, even when Characteristic Functions were as inherently simple as those used here. An analysis of this originally discouraging result (see discusision) led, in simulation 3, to the exploration of variable Characteristic Functions

4.3 Simulation 3: Acquisition of Boolean functions in three variables with variable Characteristic Functions and in nite food resources The population was de ned as in the previous simulation. Each Composite System was assigned a di erent set of 5, randomly chosen Boolean Functions in three variables as Characteristic Functions. Unlike the previous simulations the Characteristic Functions for each organism were subject to mutation. Each Characteristic Function was replaced with another randomly chosen function (duplicates prohibited) with probability 0.02. Other mutation probabilities were maintained as in the previous simulations. As previously food resources were taken to be in nite. The system manifested an asymptotic increase in tness with the ttest organism in the population reaching a tness of 0.96 after 44 generations (see Fig. 6) . In the last third of its life cycle (when training could be considered to be complete) all functions in the organism achieved complete error-free performance.

Fig. 6. Fitness and accuracy of computation (Simulation 3). The increase in tness went in parallel with a roughly linear increase in the number of connections and a much slower increase in the number of neurons (see Fig. 7)

Fig. 7. Number of connections and neurons in best organism (Simulation 3). At the same time however there was a clear reduction in the diversity and the complexity of the Characteristic Functions being computed within the population (see Table 2). By the end of generation 51 80% of organisms had converged to a single set of very simple functions (80, 252, 15, 170, 12). This situation did not change signi cantly in the following 200 generations (data not given). It should be noted that all functions computed by organisms in the population were "linearly separable" and can be "learnt" in a single stage.

Table 2. Number and complexity of Characteristic Function sets (Simulation 3). Generation 1 11 21 31 41 51

Di erent Characteristic Average Theoretical Complexity Function Sets of Characteristic Function 99 35.87 30 24.46 26 16.64 15 13.15 20 15.72 12 8.75

4.4 Simulation 4: Acquisition of Boolean functions in three variables with variable Characteristic Functions and nite food resources The procedure used was identical to the previous simulation except that food availability was limited to 8000 units per "niche". The use of nite food resources made it possible to verify the e ects of competition. For the rst 150 generations tness increased asymptotically up to a ceiling of approximately 0.85. In later generations there was no systematic tendency towards further improvement. From generation 77 onwards the best organism in certain generations achieved 100% accuracy of computation. The frequency with which this occurred increased gradually up until generation 170 after which a ceiling was

achieved. Detailed data for tness and for accuracy of computation are given in Fig. 8. As far as concerns the number of connections in the system (see Fig. 9) it can be

Fig. 8. Fitness and accuracy of computation (Simulation 4). observed that the number of connections in the ttest organisms rose rapidly until Generation 80 after which it underwent a rapid fall lasting until Generation 95. From Generation 95 until the end of the simulation the number of connections rose slowly from 21 to 26. The number of neurons also rose rapidly in the early generations of the simulation. By Generation 46 this number had reached the maximum allowed by the implementation (16). For the rest of the simulation it oscillated around this number. As in simulation 3 the theoretical complexity of the functions computed

Fig. 9. Smoothened view of number of connections and neurons (Simulation 4). and the diversity of the population fell rapidly in early generations (see Table 3). This fall was however slower than in the previous simulation. At generation 51 there were 34 di erent function sets (Simulation 3=12). Average theoretical complexity of

Table 3. Number and complexity of Characteristic Function sets (Simulation 4). Generation 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231

Di erent Characteristic Average Theoretical Complexity Function Sets of Characteristic Function 99 24.11 33 19.10 28 17.48 38 15.79 31 13.78 34 13.13 30 12.48 24 11.72 26 12.02 29 13.29 25 12.37 33 11.90 27 12.10 26 12.22 25 12.78 27 12.96 20 11.40 25 11.36 29 11.21 33 11.73 24 11.38 24 11.13 30 11.42 23 11.43

functions was 13.13 (Simulation 3=8.75). Although the great majority of functions computed were simple "linearly separable" functions which a Perceptron can learn in a single stage a number of organisms had developed the capability to compute more complex "inseparable" functions which can only be learnt "step by step". In the continuation of the simulation after Generation 51 more complex functions tended to be eliminated from the population. By Generation 231 (when the simulation was halted) the population contained 23 di erent function sets; average complexity of functions amounted to 11.43. It should be observed that this is still signi cantly higher than the complexity of functions in Simulation 3. By Generation 231 all functions were linearly separable. In the majority of cases however they required a minimum network of 3 connections whereas all the functions computed in Simulation 1 required 2 or 1 connections.

5 Discussion Like the biological systems which it puports to model the CLSM has evolved over time. The results shown in Simulation 1 show that a Composite Learning System, as de ned in the model, is capable of acquiring simple cognitive function, including linearly inseparable functions. It was initially expected that that it would be possible to reproduce this ability in larger scale models. Simulation 2 showed however that this expectation was unfounded. Despite a thorough exploration of parameter space, organisms based on the CLSM systematically failed to acquire simple Boolean functions in three variables. It appeared obvious that if the model was incapable of "learning" these extremely elementary functions it had little hope of

"acquiring" more interesting cognitive capabilities. The initial hypothesis seemed to be e ectively falsi ed. The most interesting aspects of the CLSM emerged from an ad hoc hypothesis designed to circumvent this negative result, namely the idea that Characteristic Functions could themselves be subject to evolution. While work by Lund and Parisi [8] provided theoretical support for this innovation, the results of Simulation 3 showed its practical potential: when organisms were allowed to "choose" their own Characteristic Functions they achieved far higher " tness" than when they were forced to "learn" arbitary functions chosen by the experimenter. The organisms which evolved in Simulation 3 were however irremediably "dumb". The population emerging from the simulation was dominated by a single lineage computing extremely simple characteristic functions. While the revised model was no longer a complete failure it was still a very weak model of cognitive evolution. An analysis of the model used in Simulation 3 led to the conclusion that the domination of the population by a single lineage was due to the failure to take account of competition. Whatever the size of population in a particular ecological niche there was always enough food for all. In these circumstances the tness of an organisms was equivalent to its ability to compute its characteristic functions. If one set of functions was inherently "easier" than any other it was inevitable that the lineage computing that function would successfully invade the population. If this lineage achieved very high computational accuracy no mutation could improve its tness. There was thus no selective advantage to evolving more complex functions. In order to provide such a selective advantage the model was modi ed so that the resources associated with each niche were now nite. It was predicted that nite resources would place a ceiling on the number of organisms which could occupy any particular niche and that the limited availability of "simple niches" - a statistical property of the Boolean Function family - would force organisms to evolve a higher degree of diversity and more powerful cognitive function. This prediction was con rmed by the results of Simulation 4. The model used for Simulation 4 represents the current state of evolution of the CLSM. Compared to ANN and GA simulations of real life cognition the functions simulated are extremely primitive. The model nonetheless avoids some of the constraints which prevent classical ANNs and GAs from modelling large scale evolution and learning. Further, it appears suciently rich to provide insight into a broad range of di erent issues: in particular the scaling problems which beset conventional ANNs, the avoidance of local optima, the evolution of complexity, the modularity of the brain and the existence of xed "sequences" in animal and human learning.

5.1 "Step by step" learning and the speed of learning

During the 1980s an inordinate amount of ANN research e ort was devoted to the so-called "credit assignment" problem. It was well known that there were strict limitations on the computational problems which could be solved by simple single layer ANNs. Without learning to train multi-layer models, it was believed, ANNs would never acquire the computational power to model biological intelligence in a useful way. Hence a proliferation of novel algorithms and architectures - the Hop eld model, the Boltzmann machine, Back-Propagation etc. - and the tendency to devalue the work of earlier pioneers. The CLSM provides a new perspective on these issues. The whole problem of "hidden neurons" arises from the diculty in training neurons which have no direct connection to the output of an ANN. If multilayer neural networks are a requirement and if all network connections are trained "in a single shot" the problem is inescapable. The CLSM shows, however, that if organisms evolve and learn in an

environment in which functions can be acquired "stepwise" via pre-adaptation and if their neural architecture is modelled so as to allow "step by step learning" they can compute at least some of the linearly inseperable functions computed by traditional multilayer models. "Step by step" function acquisition in the CLSM allow it to avoid one of the worst problems aicting current generation ANNs - namely their poor scaling properties. The CLSM evolves organisms which are able to solve a subset of the ensemble of problems of a given complexity. Problems belonging to this subset are solved extremely fast. We conjecture in fact that for these problems learning time is less than linearly proportional to problem complexity. This contrasts with traditional ANN learning where the learning time is believed to increase exponentially with complexity [2] [6]. Consider a CLSM organism with m neurons. Learning time depends on the number of steps in the learning process. The worst case is thus a linear chain of neuron. If each step can be performed in a small constant time we obtain t  O(m)

(4)

where t is the time necessary for the system to acquire its Characteristic Functions, and m is the number of neurons in the system. In other words learning time is linearly or less than linearly proportional to the number of neurons in the system. The size of a neural network is best represented by the number of connections in the system which for a neural network with m units is given by O(m2 ). The algorithmic complexity of the problems which can be solved by an optimally designed network is exactly represented by the size of the network. We may thus write C = O(m2 )

(5)

where C represents the algorithmic complexity of a problem which can be solved by an optimally designed network of m neurons. If CLSM organisms were optimally designed we would thus obtain the following learning time

p

t  O( C)

(6)

which would obviously be a highly desirable property. In reality CLSM organisms are clearly suboptimal (see the preceding section). Theoretical analysis nonetheless suggests that the CLSM algorithm selects in favor of organisms with rapid learning time and which are thus as small as possible (small size implies fewer learning steps). Cross-sectional data from Simulation 4 (see Table 4) support the hypothesis that there is no explosion in the size of CLSM organisms as problem complexity increases. The connection/complexity ratio (a/b) in Table 4 provides additional evidence that the size of CLSM organisms with high accuracy of computation does not increase drastically for problems of unusual complexity. It thus seems legitimate to conjecture that for those problems the CLSM is capable of resolving learning time is in fact less than linearly proportional to problem complexity.

Table 4. The connection/complexity ratio (a/b) for networks with high accuracy of computation (>0.95). Connections Complexity Accuracy of Generation (a) (b) a/b Computation 51 23 14 1.64 0.97 91 15 9 1.67 0.97 101 26 13 2.00 0.98 111 23 11 2.09 0.99 121 25 10 2.50 0.98 131 20 10 2.00 0.97 141 28 10 2.80 0.98 151 26 16 1.63 0.96 161 23 13 1.77 0.98 171 30 13 2.31 1.00 181 31 13 2.38 1.00 201 24 10 2.40 0.99 211 27 13 2.08 1.00 221 26 10 2.60 0.99 231 28 13 2.15 1.00

5.2 Variable Characteristic Functions and the avoidance of local optima It is a commonplace of evolutionary biology that evolution has no nal goal. Over time lineages of organisms radiate to occupy new ecological niches; features acquired as an adaptation for a particular purpose are often used later with a completely different function. This so-called "pre-adapation" is not so much a curiosity as an essential feature of the evolutionary process (see e.g. Lund and Parisi's discussion [9]). In classical GAs, on the other hand, populations optimize a single, static tness function; adaptive radiation and pre-adaptation are excluded by de nition. What is being modelled appears to be not so much an evolutionary process as a single stage in the process. The CLSM avoids these limitations. In the CLSM the characterization of lineages by multiple, evolving Characteristic Functions guarantees that organisms occupy the ecological niches which are best adapted to their computational abilities. The simulation results show, unsurprisingly, that they begin by occupying niches associated with simple cognitive functions; at a later stage and under competitive pressure, their ability to compute these simple functions become a "pre-adaptation" for the acquisition of more complex "composite functions". The use of evolving Characteristic Functions reduces the probability of evolving Characteristic Functions "sticking" in "local optima". Let an organism have n neighbours, where a neighbour is any genotype which can be achieved in a single evolutionary step, and let the probability of a single neighbour being less t than the current organism be equal to p. The probability of all neighbours being less t than the current organism (i.e. the probability of the current organism being a local optimum) is equal to pn . As n ! 1 1 ? pn ! 1. The availability of variable Characteristic Functions increases n. The CLSM is thus less likely to get stuck on a local optimum than a conventional GA. If there is sucient variety of ecological niches and if "hard" niches can be reached via a series of "easier" ones it is almost certain that CLSM lineages will reach at least some interesting landmark. In the much poorer environment typical of a classical GA the chance that a lineage will nd a path to a single, prede ned function, is, unless the function is very simple, extremely low. From this point of view the CLSM appears to be a more realistic representative of biological evolution than classical GAs.

5.3 The evolution of complexity

It is often stated that evolutionary biology has no theoretical reason to expect evolutionary lineages to increase in complexity with time [17]. It is interesting to observe that in the CLSM, there are theoretical reasons to expect network size to increase and that these are supported by simulation results. In the CLSM the integration of new neurons within an existing network is a relatively "easy" evolutionary task. Analysis shows that there are many circumstances in which an increase in the number of neurons or connections is selectively neutral. The addition, for example, of an "interneuron" with no direct or indirect connections to output has no e ect whatsoever on the tness of the organisms. Where the addition of new neurons or connections allows the computation of functions the organisms was previously incapable of computing it makes a positive contribution to tness. The ease with which new neurons or connections may be added to an organism contrasts with the diculty in simplifying a functional system. As integration becomes more complex deep interdependencies emerge making it impossible to delete or modify the connections of "entrenched" neurons or the neurons themselves without lethal e ect on the whole system. The low cost of adding new neurons and connections and the high cost of their removal makes it possible to predict that the number of neurons and connections in a particular evolutionary lineage will tend to increase over evolutionary time. This result is con rmed by simulation results showing a close to linear increase in network size over time, even where there is no corresponding improvement in tness. An interesting corollary is the tendency of CLSM organisms to accumulate sub-optimal neural circuitry (as discussed for Simulation 1).

5.4 The modularity of the brain

In the CLSM a learning task is divided into a series of sub-tasks each of which is resolved by a particular module of a CLS. The result, as shown above, is fast learning. The approach is simular to the work by Fogelman [4] in the sense that both make use of a typical computer science divide and conquer strategy. There is however a major di erence between the two approaches. While Fogelman's Multi-Modular Architecture demands an a priori decomposition of the problem into subtasks in the CLSM this division evolves as a result of the working of the model. Both in the real world and in the CLSM tness is positively related to speed of learning. It follows that if organisms are allowed to "choose" the functions they compute there will be strong selective pressure in favor of functions which can be learnt rapidly. In the context of the CLSM this implies that the subnetworks responsible for the computation of speci c functions should be as small as possible. The results of Simulations 3 and 4, in which dominant organisms computed functions requiring low connectivity, provide support for this view. Extending the CLSM results to the real world this implies that there are strong selective pressures favoring a modular brain architecture with very small modules, whose complexity may not be enormously higher than that of current ANNs. It is interesting that in a reassessment of "the new connectionism" Minsky and Papert themselves suggest a similar conclusion [11]. A modular brain architecture consists of a series of small subnetworks, each of which can learn quickly, in parallel with learning by other modules or in sequence. It should be quite clear that the time constraints on this "step by step" learning are much less rigid than those applying to networks which have to be trained in one go.

5.5 The evolutionary design of "training sequences" - xed learning stages

The ability to train large networks is one of the CLSM's major "strong points". It must however be pointed out that this advantage is purchased at a price. A classical ANN (or GA) optimises a single tness function. Composite Learning Systems on the other hand optimise not only their nal "objective function" but also a set or sequence of predecessor functions. The design of such a "learning sequence" is analagous to the design of a training schedule for an animal or human subject. It is well known that for complex problems this is an extremely dicult task. One of the key strengths of the CLSM is that this task is itself solved by natural selection. As shown earlier the neural architecture of CLSM organisms implies that "learning" is necessarily a step by step process. Simple cognitive functions acquired at an early stage in the learning process act as "building blocks" for the construction of more complex "composite functions". It can thus be expected that all systems will acquire particular cognitive and behavioural abilities in a xed, identi able sequence. Given that these systems depend on the inherent mathematical structure of the functions being computed they are likely to be insensitive to changes in system architecture. Sequences such as those predicted have been identi ed in the simulations and are strongly reminiscent of the learning stages found in studies of human and animal developmental psychology (see e.g. [15]).

5.6 Directions for future research

The CLSM simulations which have been carried out to date are based on small populations of tiny organisms computing Boolean functions in just 3 variables. The next step which should be taken is to validate the model using bigger populations of larger organisms computing more complex functions. If the model behaves as predicted these larger scale simulations should show the emergence of new and more complex functionality, while maintaining the same basic properties as those demonstrated by current simulations. Assuming this initial validation is successful the are many possible directions for future research. In particular, the authors intend to experiment with unsupervised learning algorithms allowing hidden neurons within CLS to acquire "feature detector" functions without explicit training, with the introduction of a principled seperation between genotype and phenotype as recommended by Nol and Parisi [12], and with the modi cation of the model to allow organisms to interact and coevolve in a more naturalistic environment.

Acknowledgement The authors would like to thank Dr. P. Zorkoczy (Open University, UK) and Dr. F. Favata (ESTEC) for their invaluable critical contribution in the preparation of this paper.

References 1. H. Ackley, G.E. Hinton, and T.J. Sejnowski. A learning algorithm for boltzmann machines. Cognitive Science, 9:147{169, 1985. 2. A. Blum and R. L. Rivest. Training a 3-node neural network is np-complete. In Proc. of the 1988 Workshop on Computational Learning Theory, pages 9{18, San Mateo, 1988. Morgan Kaufmann.

3. F. Favata and R. Walker. A study of the application of kohonen-type neural networks to the travelling salesman problem. Biol. Cybern., 64:463{468, 1991. 4. F. Fogelman-Soulie. Integrating neural networks for real world applications. In J. M. Zurada, R. J. Marks II, and C. J. Robinson, editors, Computational Intelligence: Imitating Life, New York, 1994. IEEE Press. 5. K. Fukushima, S. Miyake, and I. Takayuki. Neocognitron: a neural network model for a mechanism of visual pattern recognition. IEEE Transactions on Systems, Man and Cybernetics, SMC-13:826{834, 1983. 6. J. S. Judd. Neural Network Design and the Complexity of Learning. MIT Press, Cambridge, MA, 1990. 7. T. Kohonen. The neural phonetic typewriter. Computer, 21:11{22, 1988. 8. H. H. Lund and D. Parisi. Simulations with an evolvable tness formula. Technical Report PCIA-1-94, C.N.R., Rome, 1994. 9. H. H. Lund and D. Parisi. Pre-adaptation in populations of neural networks evolving in a changing environment. Arti cial Life, 1995. To appear. 10. McCullough and Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 1943. University of Chicago Press. 11. M. Minsky and S. Papert. Perceptrons. MIT Press, Cambridge, MA, 1968. 12. S. Nol and D. Parisi. 'genotypes' for neural networks. In M. Arbib, editor, Handbook of Brain Theory and Neural Networks, Cambridge, MA, 1995. MIT Press. In press. 13. D. Parisi, F. Cecconi, and S. Nol . Econets: neural networks that learn in an environment. Network, 1:149{168, 1990. 14. S. Paternello and P. Carnevali. Learning capabilities of boolean networks. In Neural Computing Architectures. North Oxford, 1989. 15. J. Piaget. La construction du reel chez l'enfant. Delachaux & Niestle, Paris, 1967. 16. D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning representations by backpropagating errors. Nature, 323:533{536, 1986. 17. E. Szathmary and J. Maynard-Smith. The major evolutionary transitions. Nature, 374:227{232, 1995. 18. A. M. Turing. Computing machinery and intelligence. Mind, 59:433{460, 1950. 19. S. Wolfram. Statistical mechanics of cellular automata. Reviews of Modern Physics, 55:601{644, 1983.