Artificial Neural Networks Design using Evolutionary Algorithms - DCA

79 downloads 13416 Views 345KB Size Report
genetic operators to evolve the networks, with no need of coding. Finally, the G-Prop method [68-76], carries out ...... 313-326, IOS Press,. Amsterdam, 1992. 163.
Arti cial Neural Networks Design using Evolutionary Algorithms P.A. Castillo1 , M.G. Arenas1 , J.J. Castillo-Valdivieso2 , J.J. Merelo1 , A. Prieto1 and G. Romero1 1

Department of Architecture and Computer Technology University of Granada. Campus de Fuentenueva. E. 18071 Granada (Spain) 2 Department of Technology IES Europa. C Miguel Angel Blanco s/n. Aguilas, Murcia (Spain) Phone: +34 958 240460 e-mail: [email protected]

Fax: +34 958 248993 URL: http://geneura.ugr.es

Although a great amount of algorithms have been devised to train the weights of a neural network for a xed topology, most of them are hillclimbing procedures, which usually fall in a local optimum; that is why results obtained depend to a great extent on the learning parameters and the initial weights as well as on the network topology. Evolutionary algorithms have proved to be very e ective and robust search methods to locate zones of the search space where nding good solutions, even if this space is large and contains multiple local optimum. The application of the evolutionary algorithms to optimize arti cial neural networks is justi ed by a smaller overall computational cost, in comparison with the methods of trial and error, and their robustness as opposed to the constructive/pruning methods. However, global search techniques, such as evolutionary algorithms analyze large zones of the space searching for good solutions. In general they are less eÆcient than local search techniques nding the local optimum, so it is convenient to allow the evolutionary algorithm to select initial solutions in good areas of the search space, to locate, afterward, the local optimum in these areas. This paper reviews the di erent approaches in which evolutionary algorithms and arti cial neural networks have been combined to optimize the di erent design parameters of the latter, paying special attention to the speci c genetic operators used in these methods, and the main libraries to evolve arti cial neural networks, and those applications that use hybrid methods to solve problems whose solution would not be possible otherwise. Another objective of this paper is to be an update for previous papers such as [1, 2]. Abstract.

: Hybrid Methods, Evolutionary Algorithms, Arti cial Neural Networks, Optimization Keywords

1 Introduction Arti cial neural networks (ANN) have successfully been used in a large amount of applications [3{8], nevertheless the network design creates several problems [9, 10] since it is necessary to establish several parameters; for instance, in the case of multilayer perceptrons (MLP) trained using the back-propagation algorithm (BP) or some of its variants, it is necessary to establish the learning rate, the initial weights (that in uences in the learning speed and in the possibility of reaching the global minimum of the error function), and the number of hidden layers and neurons in each one of them (decisive in the network classi cation / approximation ability). These problems usually are approached using a method that optimizes these parameters that determine the architecture. In general these methods of optimization are tted in two groups: constructive/pruning methods [11{19] and methods that make ANNs evolve (hybrid methods, since they combine the methodologies of ANN and evolutionary algorithms - EA -), [20{25, 1, 2, 26]. Constructive methods try to adapt the size of the network to the problem, starting with a small network and adding layers and units until a solution is found. The main advantage is that an a priory estimation of the network size is not needed to be made. Pruning methods are based on training a network larger than necessary, to remove the unnecessary parts (units or weights). An advantage of these methods (in general) is that networks obtained are fast and easier to implement [27, 28, 1]. The main problem of constructive and pruning algorithms is that they are, anyways, gradient descent based methods (only they hill-climb over a di erent search space), so they might reach a sub-optimal solution (local optimum). In addition, criteria to add and to remove nodes depend on the network architecture and the problem to solve. EAs search more or less randomly the space of solutions, paying special attention (by means of the selection operator) to those zones in which the value of the evaluation function is maximum (higher ANN capability), whereas ANN classic training algorithms are algorithms of gradient descent, that in an iterative form reduce the error until a minimum is reached. In the last case, search is concentrated in a small part of the space of solutions, close to the initial point. On the contrary, EA search over all the space, thus, it is possible in a population to nd solutions located near several sub-optimal points. EAs can carry out a global search, locating the ANN near an optimal point (global). Then, by means of the ANN training algorithm (local search), the optimal point can be reached [29, 30]. This paper intends to be an updated review of the eld of design of hybrid EA/ANN methods building on previous reviews such as [1, 2], and also paying special attention to aspects such as variation operators, software and applications.

The remainder of this paper is structured as follows: Section (2) presents a comprehensive review of the approaches found in the bibliography and the design decisions made to evolve ANNs. These decisions include the coding of ANN in the individuals of the population, and the method used to assign initial weight values. Approaches that evolve ANN include connection weights evolution, evolution of the network architecture and the learning rule, and the evolution of number of ANNs input vector size. Section 3 presents the main genetic operators found in the bibliography, analyzing how those operators contribute to the hybrid algorithms that uses them. Section 4 studies the e ects of local search methods on hybrid algorithms, paying attention to the Baldwin e ect [31{34]. In section 5 variable-length-chromosome EAs, which are often used for ANN evolution, are studied, paying attention to those methods whose individuals have genetic code segment not used to codify characteristics (\introns"). Section 6 examines available evolutionary computation libraries that combine EA and ANN; and nally, section 7 describes some applications that use hybrid methods to solve problems whose solution would not be possible otherwise.

2 ANN Design Using Evolutionary Algorithms The design process we are going to consider consists of the following stages: rst, a preprocessing of the data (or patterns) is made; second, the network topology and codi cation is chosen (network inputs and outputs, coding, number of hidden layers and number of units per layer, and the type of connectivity); and nally, the network training method and learning parameters are chosen. In the evolution of ANN, three main approaches can be found: evolution of connection weights (subsection 2.3) is an adaptive and global approach to train the network, specially when gradient descent based training algorithms have diÆculties to train networks; network architecture evolution (subsection 2.4) implies an adaptation of the network topology to the problem without human intervention, which represents an approach the automatic design of ANN because the weights and the architecture can be evolved; learning parameters and rule evolution (subsection 2.5) means an adaptation of the learning rules, in the sense of the learning parameters search, or even new learning rules search. Subsection 2.6 presents hybrid methods that carry out the evolution of the network input vector, avoiding redundant inputs.

2.1 Which kind of ANN are commonly evolved? Several approaches to evolve almost any kind of ANN can be found in the bibliography, however they can be divided in two broad elds:

{ Evolving generic ANN, searching for the best ANN, despite its structure. { Select a pre xed architecture easily evolutionable (by its structure, parameters, etc).

First approach presents the advantage of avoiding to restrict the search to an speci c area of the search space, allowing the search of networks of any kind. Nevertheless, it requires to decide the coding (which sometimes can be complicated to handle), the representation, and to establish some restrictions to use the genetic operators and tness function to generate valid networks (backwards connections may not allowed, for instance). Second aproximacin presents the advantage of having some previous knowledge on the problem (the type of network searched), so it is simpler to evolve them: representation, coding, genetic operators, etc, is established by the architecture. Moreover, another advantage due to the fact that the architecture is pre xed, well-known training algorithms can be used.

2.2 Choosing the coding and representation In the evolution of the connection weights as well as in the evolution of network architecture, genetic operators are fundamental in the operation of the EA. However, the form in which they are applied depends on how networks are represented (binary or real representation) and on the amount of information each individual of the population will codify (direct or indirect coding).

Binary vs. real representation The evolution of connection weights takes two phases: rst, the representation of the weights for each individual must be decided (binary strings or real valued numbers ). Second one is related to the searching genetic operators used (see section 3). These design decisions are very important, because the use of di erent representations and genetic operators lead to obtain very di erent results. Genetic algorithms [35, 36] usually use binary strings to codify candidate solutions. Initial research in evolving ANN connection weights [10, 37{43] use a representation scheme so that each weight is represented using several bits, and a complete ANN is represented concatenating those weights in the individual. Weights for a single hidden unit are placed together so that the crossover operator interchanges complete hidden units, and not only individual weights [44{46]. An advantage of the binary representation is its simplicity and generality, so that applying the mutation and crossover operators is very simple, and there is no need of design new speci c operators. On the other hand, the physical implementation is simple since the weights can be represented in terms of bits in the hardware. Generic EA tools can also be applied to ANN evolution in this case, since all EA tools handle easily binary representation. As a disadvantage, it is necessary to make a balance between the precision and the length of the individual. If few bits are used to represent each weight, training could fail because certain combinations of real weights could not be approximate by discrete values. A detailed study on the problem of limited

precision to represent neural net weights can be found in [47{50]. Using many bits to increase the precision causes bloat in individual representation, making evolution too ineÆcient (although this is not too important when using hybrid algorithms). On the other hand, oating point representation has been proposed; each weight is represented by a real number [51{58]. Since each individual is formed by a vector of real numbers, the classic crossover and mutation operators can not be used, thus it is necessary to design new operators. Montana and Davis [51] proposed several operators to construct detectors of characteristics during the evolution. A way to evolve vectors of real numbers is to use evolutionary programming (EP) or evolutionary strategies (ES), thought to make continuous optimization. Several papers use these evolutionary approaches to evolve ANN [52, 54, 59, 60, 57, 61] using, as main search operator, the mutation (Cauchy or Gaussian [62, 63]). Although oating point representation is more precise, a disadvantage (derived from the increase of precision) is that the search space is extremely large [64, 65]. Some researchers present hybrid methods that avoid the task of codifying the network in a population individual. In these papers, the EA does not

need to codify the network, instead of that, the data structure that represents the neural net is evolved using several speci c genetic operators. One of these methods is proposed by Castillo et al. [66], that present a hybrid method that combines simulated annealing and MLP to search the initial weights and the learning parameter of BP. Also, Rivas et al. [67] propose a method to optimize the weights and centers of a radial basis function network (RBF), using an EA that makes use of speci c genetic operators to evolve the networks, with no need of coding. Finally, the G-Prop method [68{76], carries out the optimization of MLP by means of an EA, searching for the learning parameters of the training algorithm, the initial weights of the network and a suitable architecture to obtain a good solution. The genetic operators (mutation, crossover, adding, pruning or replacing hidden neurons, and training operator) act directly on the ANN object, but only initial weights and the learning rate are subjected to evolution, not the weights obtained after training. The \genetic atom" is a hidden layer neuron; most operators treat hidden layer neurons and weights to and from it as an unit, as proposed in [77]. Using real valued or binary representation, the application between inputs and outputs can be implemented by di erent networks, simply permuting some of the neurons of the hidden layers [78, 79, 77] (\competing conventions problem", see section 3.2), this is a problem that produces a high degeneration in the ANN representation. In order to avoid this problem, if the topology is xed in advance,

the best representation is \no-representation", using speci c genetic operators to evolve ANN data structures.

Direct vs. indirect coding Architecture evolution takes two phases: rst consists on deciding the coding of the network in the genotype of the individual. Second one is related to the genetic operators used for searching (see section 3). The key when deciding the coding of an architecture is in the amount of information that each individual will codify. It is possible either to use the maximum detail, if each connection and node of the architecture is codi ed (direct coding ), or to codify only the more important parameters of the architecture, such as the number of nodes in each layer (indirect coding ). Using direct coding , each connection is speci ed by means of its binary representation [10, 80{84, 7, 8, 85]. In general, to represent a network with N neurons, a matrix of N  N is used, where each element cij indicates the presence or absence of connection between the node i and j . This approach can be extended so that each cij is a real number that represents the connection between the node i and j ; thus simultaneously can be made evolve the architecture and the connection weights [39, 86, 87, 80, 81, 88, 7, 8, 85]. Each matrix has a unique correspondence with the corresponding network architecture. The binary string that represents an architecture is the concatenation of the rows (or columns) of the matrix. Using this coding, restrictions to the architectures to be explored can be made, simply imposing certain restrictions to the matrix. For example, to codify feed-forward networks, triangular inferior matrix must be taken. The main advantage of this kind of codi cation is its facility of implementation and handling by the genetic operators, so that a connection can easily be added or pruned. Its main problem is the lack of scalability. Big ANN needs a big matrix, and that increases the computation time. In order to reduce the architecture representation length, many authors have used the indirect coding , [89, 90, 7, 8, 85], codifying only some of the characteristics of the architecture in the individual. The way each connection is codi ed is prede ned according to previous knowledge on the problem domain, or by means of deterministic rules. Indirect coding can generate more compact representations of the network architecture, although it does not implies nding more compact networks. In the bibliography, several indirect coding methods can be found, although most of them are based on using of parametric representation . Network architectures can be speci ed using several parameters, such as the number of hidden layers, the number of nodes per layer and the connections between them. Harp et al. [89, 90] proposed an approach consisting on representing layers and the output connections. This coding method can reduce the individual length, although it can be used only when the searched architecture type is known,

besides, small changes in the representation (such as the number of hidden layer units) might produce huge changes in tness, which makes genetic search more similar to random search, because of the rough tness landscape. Koza [91] and Zhang [92] proposed to represent neural networks as parse trees , and evolve them using a crossover operator which swaps subtrees repre-

senting subnetworks. However, the graph-like structure of neural networks can not be directly represented with parse trees. Instead of that, a type of indirect coding, based on construction rules , [93{98, 7, 8, 99] is used. In this approach, instead of codifying architecture parameters, construction rules used to form architectures are codi ed in the individual. A construction rule usually is a recursive equation [99] or a generation rule [93] similar to the rules of production of the production systems. The way to generate the matrix that speci es the connectivity is to start from a base symbol, and apply the di erent production rules from the non-terminal elements of the matrix until it contains terminal symbols (1 will mean the connection existence, and 0 the absence). The method proposed by Gruau [97] starts with a neuron and lets grow the network by applying recursive rules, being possible to express recursive networks. According to the author, this method generates very compact and escalable networks. Recently, a new form of GP, Parallel Distributed Genetic Programming (PDGP) [100], in which programs are represented as graphs instead of parse trees, has been applied to the evolution of neural networks [101]. The method allows the use of more than one activation function in the neural network but it does not include any operators specialized in handling the connection weights. However, this representation makes ineÆcient use of memory; that is why, gueira-Pujol et al. [102] propose a dual representation, in which a linear chromosome is converted into the grid-based PDGP representation. The dual representation includes every detail necessary to build and use the network: connectivity, weights and the activation function of each neuron.

Direct coding present the problem of the lack of scalability and it is likely of generating non valid networks (backward connections, by example). Interest on Indirect coding is mainly academic, because networks can not be trained using classic and well-known training methods, such as BP or LVQ. As commented above, if possible, the best option is not to use coding of any kind, and evolve networks directly.

2.3 Evolving connection weights The initial weights of a neural network are very important to obtain a fast convergence in backpropagation and other ANN training algorithms, since, depending on the point of the search space from which it has started (and that point is determined by the set of weights) better or worse solutions can be obtained when carrying out the network training [103]. Weight evolution involves the setting initial values (2.3) and the way the EA changes them (2.3).

Weight initialization In the bibliography di erent methods of weight initialization can be found [103]. Simplest of all is based on making a random initialization [104], other methods require a statistical analysis of the training data, which make them less eÆcient, although probably more powerful. Fahlman [105] proposed, after an experimental study on multilayer perceptrons, to use an initial range between [ 4:0; 4:0] and [ 0:5; 0:5] depending on the problem. Other authors [106{110] proposed particular ranges of initialization that give good results in certain problems, although without mathematical justi cation or a comparison of results with other methods. Lee et al. [111] theoretically proved that nodes with high value weights can su er of saturation (changes made to these weights will not a ect the node output). In that work a small range of initialization is proposed, that can lead to a higher learning speed. Other authors propose non-random methods of initialization [112{114]. These methods need a preprocessing of the input patterns, previous to the application of the training algorithm, which increases the computational cost. In [114] an alternative method to the random initialization is proposed. In this approach, the weights of bias are initialized so that the objective function is minimized, whereas the rest of the weights are initialized randomly. The initial period of decrease of weights during the training (phenomenon that causes the MLP produces bad results) is supposed to be eliminated.

Training weights Training ANN usually is formulated as a problem of minimization of an error function between the desired outputs and the actual network output, while those weights are adjusted. BP and their variants have successfully been applied in several areas [3{5], although it presents certain problems because [10] it is based on gradient descent [9]. A way to solve the problems that present these methods is to directly evolve the connection weights. Thus, an EA can be used to globally search for an optimal set of weights, without using gradient information [115, 51, 52, 116, 86, 54, 117, 60, 118{120, 56, 121{123, 10]. Thus, the problem of tting the parameters of the learning algorithm (learning rate, momentum, etc.) is avoided, and, in addition, no restrictions to the topology of the network (activation function of continuous and derivable) are imposed. The evaluation function must be de ned taking into account two factors: the error between the desired and the obtained outputs, and the complexity of the network. The evolution of the connection weights takes two phases: rst one consists on deciding the weight representation, that is, to use for example binary chains (see subsection 2.2). Second is related to the searching genetic operators (see

section 3). These design decisions are very important, since the use of di erent representations and genetic operators can lead to obtain di erent results. The use of an EA can be applied to train di erent kind of ANN, independently of their type. The evolutionary approach can save much e ort in the development of algorithms to train di erent kind of ANN. At the same time, the use of the evolutionary approach can facilitate the implementation and the use of certain features of the networks. For example, the generalization ability can be increased, and the network complexity can be decreased, including certain features in the evaluation function (incorporating \weight decay"). Training MLP using EA can be slow compared to some variants of the BP algorithm. Nevertheless, EA are much less sensitive to the initial conditions, in addition they look for a global and optimal solution, whereas a gradient descent based algorithm only can nd the local optimum in the neighborhood of the point from which it started the search. In the bibliography we can nd papers in which, for certain problems, the evolutionary approach is faster than BP [53, 124, 125, 54, 126, 127]. Other authors, on the contrary, present results showing that some variation of BP is more advantageous to solve certain kind of problem [93]. These contradictory results can be due to the comparison of di erent EA and di erent versions of BP. It is evident that there is no a training algorithm better than the rest; the best one always depends on the problem to solve, which is in agreement to the no free lunch theorem [128]. In general, EAs are ineÆcient carrying out a local search, whereas they are very eÆcient making global searches. For that reason, evolutionary training can be improved if a local search method is used to tune the solution found by the EA. Thus, the EA will search for a good region in the search space, and later the local method will tune the solution found by the EA to nd one closer to the optimum. This approach produces better results than those that use methods based on gradient descent to solve problems using big networks. There are several papers [129, 116, 130{134] that propose hybrid methods to search the initial set of weights closest to the optimum, and later use BP to carry out the local search from these weights. If we consider that BP must be used several times to nd a suitable set of initial weights, due to its dependency to the initial conditions, this kind of hybrid methods are probably the best option for designing ANN.

2.4 Evolving network architecture As already commented, the design of the ANN topology is very important as far as it establishes its learning and generalization ability. A network with very few neurons may experiment diÆculty to learn the required task, whereas a network with too many neurons can over t its outputs to the training patterns, decreasing its generalization ability when new patterns are presented.

The network architecture includes the network topology, the connectivity and the transference function of each neuron. Up to now, the architecture design had been made manually, by an expert with the suÆcient experience, by means of a process of trial and error. Automatic approaches more widely studied are incremental and pruning algorithms [11{19]. The design of an optimal network architecture can be formulated as a search problem in the architecture space, where each point represents a possible network architecture. This optimization problem can be solved, due to the following characteristics pointed by Miller et at. [83], more easily using an EA than using incremental or pruning methods, for the following reasons:

{ The search space is in nitely large, since the number of nodes and connections is not xed beforehand. { The error function surface is not di erentiable, since changes in the number of nodes or connections are discrete. { The error function surface is complex and noisy, since the application be{ {

tween the architecture and its representation ability is indirect and depends on the evaluation method. Similar architectures can have a di erent ability. The error function surface is multimodal, since di erent architectures can have a similar ability.

As in the connection weights evolution approach, in this other there are two fundamental phases: the genotype representation of the network (direct or indirect codi cation, see subsection 2.2), and the EA (the genetic operators used, see section 3). Several papers about the network architecture evolution approach can be found in the bibliography [86, 83, 132, 88, 81, 135, 94, 136{147]. Most of the research has been focused in the task of making evolution of the network topology, leaving the transference function prede ned [147]. Several works demonstrate the importance of these functions in the ability of ANN [148{ 150], which has made some authors to try an optimization of the parameters of de nition, or at least, to de ne which transference function to use in each one of the network. In a pioneering attempt, Stork et al. [151] applied an EA to evolve the topology and the transference function of simple ANN. The topology as well as the transference function was codi ed in the genotype of each individual of the EA population. White and Ligomenides [88] proposed simpler approach. They proposed to use a population with 80% of their nodes with sigmoidales functions, and 20% remaining, with Gaussian functions. Research was focused on deciding what mix was the optimal one, in fact, none of the parameters of these functions was evolved. Liu and Yao [139] applied evolutionary programming (PE) [152] to evolve ANN with a mixture of Gaussian and sigmoidals functions so that the percentage of each one was not pre xed. The algorithm added or removed units, choosing

randomly the type of function to use in this unit, but improvements presented in these papers due to the evolution of the transference function are not signi cant. For an architecture, di erent weight sets can produce di erent results, thus several authors have proposed methods that make evolve simultaneously the weights as well as the architecture [86, 88, 96, 94, 136{139, 141{144], improving obtained results.

2.5 Evolving the learning rule ANN training algorithms or learning rules depend on the type of architecture used. Due to the lack of knowledge on the network architecture, it is better to develop an automatic system to adapt the learning rule to the architecture and the problem to solve. Several models have been proposed [32, 29, 153{164], although most of them are focused on how the learning can modify or guide the evolution [32, 29, 153], and in the relation between the evolution of the architecture and the connection weights [154{156]. Few of them focus on the evolution of the learning rules [157{ 160, 162, 163, 145]. The current problem has been approached in di erent ways, rst of which is based on the search of the BP algorithm parameters (learning rate and momentum) [90, 116, 165]. Some researchers [116, 166, 165] propose the use of evolutionary processes to nd the BP parameters, leaving the architecture prede ned. Others [90, 122] propose to codify the training algorithm parameters in the population individuals, along with the network architecture. These methods search for an almost optimal combination of the training algorithm parameters and the network architecture for a problem [167{169]. All papers mentioned above limit themselves to apply an evolutionary strategy to search for the learning algorithm parameters; however the update rules are still prede ned and xed. On the other hand, evolution of the learning rule is oriented to provide a dynamic behavior to ANN, as opposed to evolving the connection weights and the architecture (which deals with static objects -weights and topology-). Due to the complexity of generic representation for all the possible learning rules, it is necessary to establish certain restrictions to simplify the representation and the search space. Therefore, it is necessary to consider that the rule of adaptation of weights depend only on local information, such as the current node activations or the current weights, and that the learning rule will be the same for all the ANN connections. Chalmers [157] de ned a learning rule as a linear combination of variables and product terms. Each individual of the population is a binary string that exponentially codi es ten coeÆcients plus a scale term. In his experiments he proved the potential of this approach to stochastically discover new rules from a set of generated rules.

Other authors [158{160, 162, 163] have developed methods based on Chalmers' approach, extending this one, obtaining similar results, showing that the learning ability can be improved by means of evolution. If no knowledge about the ANN architecture that solves a particular problem is available, a method that evolves (and discover) the learning rule should be used. If a well-known training method can be used, a method that determine the learning parameter values is more suitable.

2.6 Evolving the input vector dimensionality

There are problems where the ANN input vector dimensionality can be too large, leading to what is usually called the curse of dimensionality : learning becomes exponentially more diÆcult with the increasing number of inputs. However, some of those inputs are redundant, thus they increase the network size and the necessary time to train and to obtain a good solution. De ning the ANN optimal input vector dimensionality can be formulated as a search problem, where we have a potential input vector and want to nd a sub-set that contains the minimum number of elements (inputs) and simultaneously the network does not produce worse results than those produced using the complete vector. The variables selection has been faced using other methods such as Kohonen's SOM [170{173] and multidimensional scaling [174]. On the other hand, several authors have faced this problem by means of an evolutionary approach [175{180], o ering good results. In the input evolution, each population individual represents a sub-set of the possible inputs. This is usually represented as a binary string of length equal to the total number of inputs (1 represent the presence of an input, whereas a 0 represent the absence). The evaluation is carried out training an ANN whose architecture usually is xed, using those inputs. Guo and Uhrig [175] present a genetic algorithm (GA) whose purpose is the reduction of the dimensionality of the search space, selecting the main variables used as input to a modular ANN. GA uses binary strings to represent the input vector and to select those that form the optimal sub-set to solve that problem. Dellaert and Vandewalle [179] propose a method based on a GA that codi es the network inputs (pixels of an image) in a string, searching for the optimal input sub-set. Hornyak and Monostori [178] successfully apply a GA to obtain a pattern input sub-set to train ANN for nancial prediction. In general, reducing input dimensionality via an EA could be useful for any problem, but it is probably wiser to use preprocessing techniques such as principal component analysis or even Kohonen's Self-Organizing Map (SOM) for this task.

3 Speci c Genetic Operators One of the important issues when evolving ANNs is the selection of the genetic operators used in the EA, since the search breadth and accuracy will depend on these.

The objective of genetic operators is to generate new solutions in the area of the search space where the initial solution was generated. Some times it is better to generate these solutions far away from the initial point. Most common operators found in the bibliography are: mutation, crossover and some local search operators (BP and its variants).

3.1 Mutation Mutation operator have two main roles: either 1) tuning solutions (a small change or move in the search space) or 2) in case another tuning operator is being used (one that makes a local search), changing the area in the search space (a big change or move) [134]. Moreover, making small mutations does not make sense if later on, individuals will tune themselves (through training, for instance), since changes made by the operator will be undone by the local search operator (this operator will move the individual to the local optimum). In these cases, a mutation operator that causes big changes (moving individuals between areas) is more useful. Some authors [134] conclude that, in certain problems, to avoid falling and being trapped in local optimums, mutation operators should make big variations to move the population away from these areas. These big mutations move the population from local optimum, which degrade the value of the evaluation function in the rst generations of the EA. Later on, population will be around the global optimum, which compensates the initial negative e ects of these mutations. Then, using a tuning method (mutation operator that makes small changes or a local search operator), the global optimum can be reached [134]. Montana and Davis [51] concluded that a mutation operator that acts simultaneously on all the input weights is more e ective than other than acts on the weights individually. Utrecht and Trint [181] proposed several heuristic methods to use a mutation operator that generates small changes in the solutions. Angeline et al. [182] proposed an operator that made more or less drastic changes on the weights according to how close to the solution is the ANN. Schi mann [183] restricts the changes introduced by his operator to add or remove connections between nodes. Rivas et al. [67] propose two mutation operators to evolve RBF networks: a centers mutator operator, that modi es the value of each component of the point of the space in which each radial basis function hidden layer neuron is centered adding or removing a small amount that follows a gaussian probability function; a radii mutator operator, that modi es the value of each component of the radius used in each radial basis function hidden layer neuron, according to a gaussian probability function. Castillo et al. [68, 69, 71, 72] presents a mutation operator based on the ideas proposed by Montana and Davis [51] and on the algorithm presented by Kinnebrock in [118], that modi es the weights of the MLP after every network training epoch adding a small random number. Results show that a high application rate is suitable, producing medium changes that move from the search

area (if the local search operator is used). If no local search operator is used, mutation rate should be low, to make small changes that tune MLP weights. Castillo et al. [68, 69, 71, 72] use a macromutation operator that randomly replaces a hidden neuron, by another one, initialized with random weights. This operator can be considered as a macromutation that a ects to an only gene in the MLP. In order to avoid undesired disruptions, a low application rate is suitable. In general, a weight mutation operator is needed for a correct exploration of the search space; the best strategy is probably to combine macromutation operators in the early phases of evolution with micromutation or local search operators (see subsection 3.4).

3.2 Crossover Purpose of crossover operator is to recombine those useful parts of the population individuals to form new solutions with the characteristics of both parents. The ability of the operator to do this depends on the problem at hand and on the way solutions are represented. A binary representation of the individuals makes suitable the use of a generic crossover operator. The standard crossover operator (a crossing point, two crossing points, or uniform crossover) sets each bit in the o spring to the value of this bit in the corresponding father. According to some authors, using crossover may cause negative e ects in the ANN construction, since this operator operate correctly when it is possible to interchange construction blocks, and in ANN it is not clear what can be de ned as a construction block due to the distributed nature of the information representation [184]. In MLP information is distributed between all the network weights, thus, to combine parts from two networks can cause that the characteristics of both networks become degraded. Even in those cases when the crossover operator is of utility, if the population members are very similar (the population has converged to an area), crossover does not have remarkable e ects since the generated networks may result almost identical to the parents [134]. In RBF networks, in which the information is not distributed between all the network weights, the crossover operator is a very useful operator [61, 185]. On the other hand, those networks that distribute the information between their weights usually are more compact and have a higher generalization ability. The crossover operator proposed by Pujol et al. [102] carry out the evolution of the architecture and the weights of neural networks codi ed using a dual representation (linear chromosome combined with the grid-based PDGP representation). This crossover operator works by randomly selecting a node a in the rst parent and a node b in the second parent, and by replacing node a in a copy of the second parend. Modi cation of the activation function and bias of a node is not performed with our crossover operator. Using MLP, some authors considered as gene (crossover interchangeable unit) all the node input weights [51, 186, 83]. Others, consider all the node input and

output weights of the hidden layer as interchangeable minimum unit between individuals [77]. Other authors [187], in spite of considering a gene as the node input weights (to interchange complete functional units), found his experiments that this crossover operator does not gives substantial advantages on other operators that make other considerations. Rivas et al. [67] uses a recombination operator, to evolve RBF ANN, that interchanges sequences of neurons of the hidden layers. Both sequences are randomly selected, and may have variable length. The crossover operator proposed by Castillo et al. [68, 69, 71, 72] carries out the multipoint cross-over between two chromosome nets, so that two networks are obtained whose hidden layer neurons are a mixture of the hidden layer neurons of both parents: some hidden neurons along with their in and out connections, from each parent make one o spring and the remaining hidden neurons make the other one. The learning rate is swapped between the two nets. This operator acts as macromutation, because when hidden nodes are interchanged between MLP, changes can degrade both networks, since they store the information in a distributed way. Thus a low application rate is suitable. Taking a look to the problems the use of the crossover-like operators present, the question considering to what extent is this an useful operator arises [35]. Authors that are in favor of using that operator argue that it should be used whenever the problem and the solution representation allows to treat with \construction blocks" [35] (schemes that represent a good solution for the problem to solve). The idea is to treat blocks independently by the EA and then to recombine them using the crossover operator. Other authors state that the crossover operator helps in the search, not recombining constructive blocks, but making similar changes to those that produces a mutation operator that causes big changes (macromutation) [134]. Crossover seems more useful in competitive learning neural nets (such as Kohonen's Learning Vector Quantization) than in cooperative learning neural nets (such as the multilayer perceptron trained with backprop); in this last case it acts basically as a mutation operator, which makes a low crossover rate more convenient in this case.

Competing conventions problem Training ANN is not a suitable task to be made by an EA that base its operation on the recombination of solutions. This is caused because the application between inputs and outputs can be implemented by di erent networks permuting some hidden layers neurons [78, 79, 77]. This is called \the competing conventions problem". This problem appears because the EA cannot solve these permutations, since it only works on the network genotypic representation. The EA treats structurally di erent networks as di erent solutions, although their operation is identical. Thus, if crossover is applied to two functionally identical networks, but structurally di erent, nonvalid o spring will be obtained.

The simplest way to avoid this problem is not to use the crossover operator [188, 182, 189, 190]. Some authors have proposed genetic operators that avoid that problem, although they use the crossover operator. Montana and Davis [51] studied several forms of intelligent croossover, identifying hidden nodes functional characteristics during the operator application. Radcli e [191] proposed to avoid incorrect representation permutations, identifying those hidden nodes with the same pattern of connectivity, in the networks to be recombined. Hancock [79] used previous idea and extended it to discover similarities between hidden nodes. Korning [192] proposed a crossover operator that automatically removes those descendants that do not reach a minimum in their evaluation function value. Thus, o spring produced by incompatible parents are removed. Thierens et al. [77] consider taht the hidden unit function comes given by its weights sign. Thus, before applying crossover, they propose to rearrange the parents genetic chain so that hidden neurons with similar number of positive and negative weights are in the same position of the chain.

3.3 Incremental and decremental operators Incremental operators start with small networks and increase them adding new

randomly initialized units to the hidden layers. That increase of size can cause the MLP to have an excessive size: big networks are faster in the learning phase [193, 194], although the way to obtain a good generalization is using the simplest network [27, 28, 1]. At the same time, the increase of size can lead in the problem of over tting: small sized networks generalize well, although they are slow in the learning phase, while big networks (high number of weights) are faster in the learning phase but obtain a poor generalization [194, 193]. This is the reason why this operator is applied together with the following one. Decrement operators remove hidden units to obtain a smaller networks [195, 196, 194]. This method avoids the networks to grow too much. Merelo et al. [44{46], in their G-LVQ method, propose using the classic GA operators (binary crossover and mutation) along with three new incremental/decremental operators: duplication operator takes an individual and duplicates the best gene, increasing individual size; elimination operator removes the worse gene in the individual, reducing individual size; incremental operator inserts a randomly initialized gene in the individual. The method proposed by Ragg et al. [147] stablishes the topology of a MLP, using a hybrid method based on EA and information theory, and using an operator that adds or removes several units of the networks, according to the relation between input/output for these units. Rivas et al. [67] uses two incremental/decremental operators to establish the suitable size of RBF networks. The incremental operator duplicates hidden neurons according to a linear probability function. Once duplicated, the new neuron centers and radii are modi ed using Gaussian functions. Decrement operator removes neurons of the hidden layer following a linear probability function.

Castillo et al. [68, 69, 71, 72] proposes two incremental/decremental operators to face one of the main problems of BP (and its variants): the diÆculty to guess a suitable size for each hidden layer. In this work the EA carries out the search of the network architecture, establishing the number of neurons in the hidden layers. The increment operator adds a randomly initialized hidden unit. The decrement operator removes the hidden unit with higher accumulated backpropagated error on all the training samples. To avoid obtaining too big or to small networks both operators are applied with the same priority. Pujol et al. [102] use a special pruning operator (evolution of neural networks codi ed using a dual representation of linear chromosome combined with the grid-based PDGP representation): after a 100% correct solution is found, a function in the internal layers of each individual in the population is replaced with a terminal, and the evolution process is resumed. This procedure is repeated until a speci ed number of generations is achieved. This strategy has the advantage of generating solutions of varying degrees of complexity. In brief, these operators are eÆcient tools for searching the ANN architecture space, but a balance must be kept between incremental and decremental operators to aboid bloat or collapse of diversity through excessive decremental operator application.

3.4 Local search operators In general, using local search operators is faster and lead to obtain more precise results than using bit-manipulation genetic operators. BP as local search operator acts re ning solutions so that those solutions cannot be improved locally. Compared to EA, this operator is much more eÆcient searching for the local optimum in an area. The EA could, eventually, reach the local optimum, although time to reach it would be higher, since tuning in EA is based on small mutations. A problem of these search operators is that some tuned individuals remain the best during the simulation (evolution stops), and tend to dominate the population due to its high tness. Thus, the genetic search is altered, su ering a loss of diversity. When a local search operator is used, population converge quickly, since individuals tend to have many characteristics in common (reduction of the diversity). Braun and Weisbrod [197] and Braun and Zagorski [190] proposed a method where the individuals were trained with BP until reaching the local optimum after being modi ed by some genetic operator, so that the descendants inherited the weights of their parents. This Lamarckian strategy (to see section 4) reduces the time of search, although the search space is restricted to those individuals with higher value of the evaluation function. Montana and Davis [51] and Yao and Liu [143] propose to use a training operator that tunes MLP usign the BP algorithm, making a local search. Castillo et al. [69, 71] propose a training operator that tunes MLP using the quickpropagation (QP) algorithm (it makes a local search to improve the individual), as proposed in [51] and [143]. When it is applied, takes a MLP

that is trained for a speci ed number of epochs, pushing back (with the trained weights) in the population. After an exahustive study, a high application rate is proposed, to obtain good results.

4 Baldwin E ect Hybrid algorithms often implement non-Darwinian ideas, e.g. Lamarckian evolution or the Baldwin e ect, where learning in uences evolution. Lamarck's theory states that the characteristics an individual acquires during its life are passed to the o spring [198]. Thus, the following generation will inherit any acquired or learned characteristic, this mechanism would be responsible for the evolution of species. According to this approach, learning has a great in uence on evolution, since all the characteristics learned are passed on to the following generation. These hypotheses have been rejected by the biologists, since does not exist a mechanism to transcribe acquired characteristics to the genetic code. Nevertheless, this does not mean that acquired characteristics do not in uence evolution. Baldwin [31] and Waddington [199] argued that this in uence is limited to the fact that the individuals with greater learning capacity will adapt better to the environment, and thus will live longer. Longevity they acquire allows them to have more o spring through time, and propagate their abilities. As the number of o spring who have acquired the ability grows, this characteristic becomes part of the genetic code. In this sense, learning guide evolution, although learned characteristics are not transmited directly to the genetic code. Learning ability helps the organisms to respond to changes in their neighborhood (changes that the evolution can not assimilate to modify the genetic code). The Baldwin e ect states that the learning helps the organisms to adapt genetically to changes in their neighborhood, and also helps indirectly to codify those adaptations in the genetic code. These ideas have previously been used by numerous researchers in di erent approaches:

{ Lamarckian mechanisms in hybrid evolutionary algorithms. Lamarckian the-

{

ory is today totally discredited from the biological point of view, but it is possible to implement Lamarckian evolution in EA, so that an individual can modify its genetic code during or after tness evaluation (its \lifetime"). These ideas have been used by several researchers with particular success in problems where the application of a local search operator obtains a substantial improvement (travelling salesman problem, Gorges-Schleuter [200], Merz and Freisleben [201], Ross [202]). In general, hybrid algorithms are nowadays acknowledged as the best solution to a wide array of optimization problems. Studying the Baldwin e ect in hybrid algorithms [32{34, 203, 204]. Some authors have studied the Baldwin e ect, carrying out a local search on certain individuals to improve their tness without modifying the genetic code of the individual. This is the strategy proposed by Hinton and Nowlan in [32],

who found that learning alters the shape of the search space in which evolution operates and that the Baldwin e ect allows learning organisms to evolve much faster than their nonlearning equivalents, even though the characteristics acquired by the phenotype are not communicated to the genotype. Ackley and Littman [203] studied the Baldwin e ect in an arti cial life system, obtaining the result that experiments in which the individuals had learning capabilities obtained the best results. Boers et al. [204] describe a hybrid algorithm to evolve ANN architectures, whose e ectivity is explained with the Baldwin e ect, implemented not as a process of learning in the network, but changing the network architecture as part of the learning process.

{ Comparative studies of Lamarckian mechanisms and the Baldwin e ect in

hybrid algorithms. Some studies have investigated whether a strategy based

on a hybrid algorithm that takes advantage of the Baldwin e ect is better or worse than one implementing Lamarckian mechanisms to accelerate the search [205]. The results obtained are di erent, and very dependent on the problem. Gruau and Whitley [206] compared Baldwinian, Lamarckian and Darwinian mechanisms implemented in a genetic algorithm that evolves ANNs, nding that the rst and the second strategies are equally e ective for solving their problem. Nevertheless, for another problem, the results obtained by Whitley et al. [207] show that taking advantage of the Baldwin e ect can nd the global optimum, while a Lamarckian strategy, although faster, usually converges to a local optimum. On the other hand, results obtained by Ku and Mak [208] with a GA designed to evolve recurrent neural networks, show that the use of a Lamarckian strategy implies an improvement of the algorithm, while the Baldwin e ect does not. In Houck et al. [209] several algorithms are studied, and similar conclusions drawn, as in [210], where a comparison between the Darwinian, Baldwinian and Lamarckian mechanisms is made. Lamarckian strategies present the advantage that can accelerate and tune the search, making that sub-optimal solution (but close to the optimum) found by the EA goes closer the global optimum; nevertheless, they present some disadvantages that impose limitations to the use of Lamarckian operators. These problems include the reduction of diversity and how their use can stop the evolution, since those characteristics learned at the beginning of the simulation can cause that certain individuals comme to dominate the population by these acquired advantages at certain moment and continue being the best individuals until the end of the simulation [211].

5 Variable Length Individual. Introns Those EA that use variable length individuals (in principle they do not have a xed number of genes), where each gene codi es a part of the solution, and codi cation does not depend on the position of the genes (only of their value), are likely to generate individuals with genetic code segments not used to codify characteristics, also called introns [212].

This kind of EA (with variable length individuals) is used in problems where solutions have several indistinguishable parts, as ANN [213, 44{46] and fuzzy controllers [214]. So far, introns have been used in three di erent ways in the evolutionary computation literature: { As non-coding bits uniformly added to the genetic representation of a problem parameters [215]. In this case, introns pad the space between the actual parameter representation. { As non-functional parts of the genetic representation of a solution, that is, parts of the solution that actually do not do anything, thus not contributing to the tness of a chromosome. This usually occurs in genetic programming [216] and in chromosomes which undergo a development cycle after birth [217]. { As a posteriori useless parts of the chromosome, that is, parts of the genetic representation that do not contribute at all to the tness of the solution. This usually shows up in some kinds of neural networks, like Learning Vector Quantization (LVQ) neural nets [218]. In these so called competitive learning neural nets [219], only some of the neurons, called winners, and others neurons have never been winners, and thus not contributed at all to the network success or failure. These neurons are useless, but only a posteriori. These kind of introns show up also in other kinds of neural nets, as in Harvey's SAGA framework [213], where they are called potentially useful junk. Wu and coworkers, in [215], state, but do not prove, since it is not the main point of their paper, the hypothesis that non-coding segments maintain variation in the individual, that is, that keeping introns in the population keeps genetic variability high. On the other hand, Harvey and coworkers, in [213], delve further into this, by proving that introns provide a neutral pathway through the tness landscape that, despite aparent genetic convergence, helps the population evolve further. In this section the relation between intron dynamics and diversity will be studied. To controll the introns percentage in the population individuals, two techniques are used: { using operators that alter the individual length and that adds or removes introns. Experiments following this line show that a slight selective pressure on introns improves search, whereas the elimination of these give priority to the exploration of the search space, losing good solutions [212]. { using selection : depending on the number of introns, the value of the individual evaluation function will be greater or smaller [213]. Along the results presented in [212], in the case of variable-length genetic algorithms with \junk genes" used to optimize neural networks: { high diversity at the end of the GA seems to correspond to low online tness. In general, creating diversity is not an eÆcient way of exploring the search space, although it is important that initial population is diverse enough.

{ elimination of introns does not seem to help performance. We should let

nature be our guide [220] and let the selection procedure get the best and

eliminate the worst. Even using any kind of var-len selection operator does not seem to improve performance in any way, except if large initial population (high initial diversity) and the add operator with a signi cant value are combined. This combination allows to extract the most of the selection and crossover operators, and tips the balance from exploration to exploitation. In a word, keeping exploitation high with high diversity and crossover is much better, for this kind of algorithms, than increasing explorations using intron-elimination operators. The diversity-generation operator that seems to work, at least in this case, are kill operators with a low rate and geneduplication-and-mutation with a high rate, if diversity is also high.

6 Neurogenetic software In this section, we will look at neurogenetic software from the point of view of a researcher who wants to implement a new neurogenetic hybrid algorithm; so far, in the vast majority of cases, when a new hybrid algorithm wants to be tested by a researcher, he/she has to create his/her own neural net and evolutionary algorithm library, which means that coding basic classes and methods takes usually much more time than implementing the novel algorithm itself. That is why, in this section, we will examine current neurogenetic software solutions from this point of view, looking at its exibility, expandability, availability and technical support available, and will try to decide on which software solution is best suited for neurogenetic evolution research. We will not examine commercial software which implements genetic neural nets for a particular solution, such as the NeuroGenetic Optimizer (which has been discontinued, but was available from BioComp Systems, Inc.), or GENETICA [221], since they do not o er the possibility of implementing new neurogenetic paradigms. However, they have their value as showcases of the neurogenetic technology, and its application to real-world, mainly nancial, problems. Other products, such as Statistica: Neural Networks [222] and Trajan [223] use genetic algorithms for input selection, but that is the only evolution capability they seem to o er, and thus are not useful for research purposes. This leaves free tools such as INANNA [224], SNNS/ENZO [225], EO+GPROP [226, 71]. The common features of these tools are that they they are libraries written in C++, which can be considered a mainstream language, they are available under a license that allows free use by researchers. Other than that, they are very di erent. INANNA [224] is a relatively new introduction to the area, although it is a development of the former Annalee neural net/genetic algorithms library. It was originally created to test many di erent neural net evolution and incremental algorithms, and this shows: it is very exible on the front of introducing new neural net architectures, and thus also very expandable. It makes heavy use of C++ capabilities, but, at the same time, it does not rely on the Standard Template

Library (STL) for base classes, using instead MagiCLib; the author includes a rationale in the package for this choice. INANNA includes, besides this, two parts: Nehep for evolution, and Annalee for neural nets. For some reason, it needs to include SNNS (Stuttgart Neural Network Simulator) classes. Documentation is next to none, and there does not seem to be any technical support, since there is only one developer and no technical support fora, outside the SourceForge ones, are mentioned. The opinion of the authors is that this product has still some way to go before becoming an usable tool for researchers; this obviously shows in the version numbers currently available, which is 0.2alpha. ENZO [225], from the outside, looks like a very powerful optimization tool for neural nets; indeed, it has been included in the last SNNS release (4.2). SNNS is a very mature neural network simulation package, and probably the most popular of the bunch, it has extensive technical support in the form of mailing lists, neural net newsgroups, and a very complete technical manual. However, ENZO 1.0, which initially used SNNS as neural net engine and has nally been integrated into it, seems to be able to evolve only one kind of architecture, multilayer perceptrons. It provides a \sample module" for implementing new extensions, but no other documentation is available; it does not seem to have been designed with extensibility in mind. In general, optimizing neural networks using a well-know and tested neural net libraries seems like a very good idea, but ENZO is too narrow-focused to be of general application. However, good lessons could be learned from it. EO [226] and G-Prop [71] are two separate tools, that can be used together to evolve multilayer perceptrons. EO is a C++ general evolutionary algorithms library, which has been designed with extensibility in mind; any class of objects that can be assigned a tness should be evolvable within the EO framework. G-Prop takes advantage of these capabilities, by designing several classes for programming multilayer perceptrons, which can be used independently from EO, and then adding variation operators (mutation and crossover) and evaluation operators which makes multilayer perceptrons evolvable; these classes are then put inside EO to create an evolutionary algorithm for evolving multilayer perceptrons. EO, by itself, is not a neurogenetic toolbox, but is easily extensible, and any neural net that is reasonably programmed as a C++ class can be evolved using EO; however, EO does not provide base classes for neural net programming (if such a thing is possible at all). There is fair technical support, in the shape of several mailing lists, for advanced and basic users, a good tutorial with program templates included in the basic release, and is fairly mature; current version (April 2001) is 0.9.1, with 0.9.2 and 1.0 not far away. From the point of view of the authors, an ideal combination would be to use EO evolutionary base classes together with INANNA/Annalee neural net extensibility and SNNS code maturity; however, our advice would be to use EO together with Inanna, since both tools seem to complement each other.

7 Hybrid methods. Applications In this section we will look at some applications that use evolutionary ANN methods to solve a wide kind of problems. There are problems that, due to their characteristics, either its solution is only possible using hybrid methods or methods that should be used to solve them are ineÆcient or unable to obtain results good enough. Thus, Moriarty and Miikkulainen [227, 228] proposed an EA to evolve recursive ANN to discover complex Othello strategies. The proposed EA used a marker-based encoding to codify networks in the individuals. Two genetic operators were used: mutation at the integer level rather than the bit level, and two point crossover. The networks were given the current board con guration as input, and were not required to decide which moves were legal, but only to di erenciate between legal moves. The networks were initially evolved against a random move maker and later against an search program. Inspired by a similar application, another group of researchers developed a neural net to avoid plane collisions [229]. The general problem is that a plane should automatically avoid collisiones (being considered as such distances less than 4 nautical miles. The neural net took as input information processed from onboard radar, and should produce as output the heading (that is, the angle with respect to previous trajectory) the plane must take. Classical backpropagation could not be used in this case, since the situation was one in which two planes, governed by the same neural net, met one each other in ight; many di erent situations, not known in advance, could be possible, and a xed training set was not possible. A GA was used thus to evolve neural net weights, and as tness, deviations from trajectory, delays, existence of con icts, as well as an overall behavior in all possible situations was taken into account. In general, situations in which the training set for the neural net is not known in advance, but created by the solutions to the problem themselves, or in which neural nets that solve a problem must be coevolved, can only be solved combining neural nets and genetic algorithms, as is the case in the two applications presented above.

8 Conclusions This work presents an exhaustive review of the di erent approaches to design ANN using EA, paying special attention to the speci c genetic operators used in these methods. In the evolution of ANN, three main approaches can be found:

{ Connection weights evolution : The simplicity and generality of the evolutionary approach, together with the fact that training algorithms based on gradient descent can fall on local optimum, make the use of EA to train connection weights a reasonable approach.

{ Network architecture evolution : As said above (see 2.4) this approach has {

several advantages on other heuristics to network architectures design. In spite of this, the architecture evolution along with the connection weights usually generates better results. Learning rule evolution : In this case, evolution is used to make the ANN adapts its learning rule. This adaptation can be carried out evolving the network learning parameters (learning rate or momentum); or the learning rules, that is, the training algorithm weight update rules.

Main advantage of the design of ANN using EA is in the ability of this to optimize the network parameters (initial architecture, connectivity, weights, learning rule), and its inherent parallelism (di erent networks can be trained simultaneously in di erent processors). Traditional algorithms focus on optimizing just a part (for example, the classi cation ability) while others are not optimized, as can be the network size. Using two criteria, the clasi cation / aproximation ability and the network size (total number of weights) can be optimized at once. The study of the Baldwin e ect in methods that evolve ANN can improve these methods, because according to the results presented by several authors, taking advantage of the Baldwin e ect can make the method obtain the convergence to the global optimum; whereas strategies based on the use of lamarckian genetic operators, in spite of a higher speed, can converge to a local optimum. A list of the main libraries to evolve arti cial neural networks, and some applications that use hybrid methods to solve problems whose solution would not be possible otherwise have presented. Nowadays, most of the methods commented in this work implement two or three of the approaches. They are only partial attempts of optimization of ANN, so it is not guaranteed to obtain the global optimum. In the short term, it would be interesting using several approaches at the same time, mainly in those cases in which there is little a priori knowledge about the problem, since in these circumstances, the use of trial and error or heuristic methods are not e ective. Thus, a method that optimizes di erent, or most, ANN parameters (network size, initial weights, learning parameters, network input vector and learning/validation/test sets) for a pre xed architecture would be very useful. Once a suitable architecture and related training algorithm have been selected, the method would search for the best ANN to solve the problem at hand. In this sense, G-Prop method (see [71{73]) implements network size, initial weights and learning parameters optimization for quickprop-trained MLP search. In the long term, it would be of interest to develop a method that search di erent ANN parameters as well as the architecture and learning rule; that is, depending on the problem at hand, the method decides the best architecture (RBF, MLP, recurrent, etc), a suitable training algorithm or new learning rule (BP, QP, OLS, SVD, etc), and searches for the best ANN to solve the problem. From the software point of view, it would be very useful to develop a framework that includes all the approaches studied in this paper, combining EA plus generic

and speci c ANN paradigms, along with generic and speci c variation operators for neural net evolution. Moreover, it would be very useful to combine the training process and visualization of the evolutionary process, as proposed by Romero et al. [230, 231]. If this idea is extended, using visualization the way the ANN operation will be better understood, and the evolutionary search and method speed will be improved, making possible fast evaluation of ANN.

Acknowledgements This work has been supported in part by projects CICYT TIC-1999-0550, INTAS9730950 and IST-1999-12679.

References 1. X. Yao. Evolving arti cial neural networks. Proceedings of the IEEE, 87(9):14231447, 1999. 2. F.J. Marn. Optimizacion de redes neuronales arti ciales mediante algoritmos geneticos. aplicacion a la prediccion de carga. Tesis Doctoral, Universidad de Malaga. Escuela Tecnica Superior de Ingenieria Informatica. Malaga, Espa~na, 1997. 3. K.J. Lang, A.H. Waibel, and G.E. Hinton. A time-delay neural network architecture for isolated word recognition. Neural Networks, vol. 3, pp. 33-43, 1990. 4. S.S. Fels and G.E. Hinton. Glove-talk: a neural network interface between a dataglove and a speech synthesizer. IEEE Trans. on Neural Networks, vol. 4, pp. 2-8, 1993. 5. S. Knerr, L. Personnaz, and G. Dreyfus. Handwritten digit recognition by neural networks with single-layer training. IEEE Trans. on Neural Networks, vol. 3, pp. 962-968, 1992. 6. L. Prechelt. PROBEN1 | A set of benchmarks and benchmarking rules for neural network training algorithms. Technical Report 21/94, Fakultat fur Informatik, Universitat Karlsruhe, D-76128 Karlsruhe, Germany, September 1994. 7. M.A. Gronroos. Evolutionary Design of Neural Networks. Master of Science Thesis in Computer Science. Department of Mathematical Sciences. University of Turku., 1998. 8. M.A. Gronroos. Comparison of Some Methods for Evolving Neural Networks. In Congress on Evolutionary Computation,In Genetic and Evolutionary Computation Conference, Morgan Kaufmann Publishers, ISBN:1-55860-611-4, Volume II, pp. 1442, Orlando, USA, 1999. 9. R.s. Sutton. Two problems with backpropagation and other steepest-descent learning procedures for networks. In Proceedings of the 8th Annual Conference of the Cognitive Science Society, pp.823-831. Erlbaum, Hillsdale, NJ, 1986. 10. D. Whitley, T. Starkweather, and C. Bogart. Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Computing, vol. 14, no. 3, pp. 347-361, 1990. 11. S.I. Gallant. Neural network learning and expert systems. Cambridge, MA: The MIT Press, 1993.

12. S.I. Gallant. Perceptron-based learning algorithms. IEEE Transactions on Neural Networks 1, pp. 179-191, 1990. 13. M. Frean. The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation 2, pp. 198-209, 1990. 14. T.C. Lee, A.M. Peterson, and J.C. Tsai. A multilayer feed-foward neural network with dynamically adjustable structures. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Los Angeles, pp. 367-369. Long Beach, CA: IEEE Press, 1990. 15. S.E. Fahlman and C. Lebiere. The Cascade-Correlation Learning Architecture. Neural Information Systems 2. Touretzky, D.S. (ed) Morgan-Kau man, 524-532, 1990. 16. M. Mezard and J.P. Nadal. Learning in feedforward layered networks: The tiling algorithm. Journal of Physics A 22:2191-2203, 1989. 17. Y. Le Cun, J.S. Denker, and S.A. Solla. Optimal brain damage. Neural Information Systems 2. Touretzky, D.S. (ed) Morgan-Kau man, pp. 598-605, 1990. 18. B. Hassibi, D.G. Stork, G. Wol , and T. Watanabe. Optimal Brain Surgeon: extensions and performance comparisons. In NIPS6, pp. 263-270, 1994. 19. W.L. Buntine and A.S. Weigend. Calculating second derivatives on feed-forward networks: a review. IEEE Transactions on Neural Networks 5, 480-488, 1994. 20. K. Balakrishnan and V. Honavar. Evolutionary design of neural architectures { a preliminary taxonomy and guide to literature. Technical report, AI Research Group, January 1995. CS-TR 95-01. 21. X. Yao. Evolution of evolutionary arti cial neural networks. in Preprints of the Int'l Symp. on AI, Reasoning and Creativity (T. Dartnall, ed.), (Queensland, Australia), pp. 49-52, GriÆth University, 1991. 22. X. Yao. A review of evolutionary arti cial neural networks. International Journal of Intelligent Systems, vol. 8, no. 4, pp. 539-567, 1993. 23. X. Yao. Evolutionary arti cial neural networks. International Journal of Intelligent Systems, vol. 4, no. 3, pp. 203-222, 1993. 24. X. Yao. The evolution of connectionist networks. in Arti cial Intelligence and Creativity (T. Dartnall, ed.), pp. 233-243, Dorcrecht: Kluwer Academic Publishers, 1994. 25. X. Yao. Evolutionary arti cial neural networks. in Encyclopedia of Computer Science and Technology (A. Kent and J.G. Williams, eds.), vol. 33, pp. 137-170, New York, NY 10016: Marcel Dekker Inc., 1995. 26. F.J. Marn and F. Sandoval. Dise~no de redes neuronales arti ciales mediante algoritmos geneticos. Computacion Neuronal. Universidad de Santiago de Compostela, pp. 385-424, 1995. 27. R.D. Reed. Pruning algorithms { a survey. IEEE Transactions on Neural Networks, 4(5): 740-744, 1993. 28. R.D. Reed and R.J. Marks. Neural Smithing. Bradford. The MIT Press, Cambridge, Massachusetts, London, England., 1999. 29. R.K. Belew. Evolution, learning and culture: Computational metaphors for the adaptive algorithms. Complex Systems vol. 4, pp. 11-49, 1990. 30. J.M Renders and S.P. Flasse. Hybrid methods using genetic algorithms for global optimization. IEEE Transactions on Systems, Man, and Cybernetics. Part B: Cybernetics, Vol.26, No.2, pp.243-258, 1996. 31. J.M. Baldwin. A new factor in evolution. American Naturalist 30, 441-451, 1896. 32. G.E. Hinton and S.J. Nowlan. How learning can guide evolution. Complex Systems, 1, 495-502, 1987.

33. R.K. Belew. When both individuals and populations search: Adding simple learning to the genetic algorithm. In 3th Intern. Conf. on Genetic Algorithms, D. Scha er, ed., Morgan Kaufmann, 1989. 34. I. Harvey. The puzzle of the persistent question marks: a case study of genetic drift. In 5th International Conference on Genetic Algorithms, pp. 15-22, S. Forrest, ed. Morgan Kaufmann, 1993. 35. J.H. Holland. Adaptation in natural and arti cial systems. University of Michigan Press (Second Edition: MIT Press, 1992), 1975. 36. D.E. Goldberg. Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, 1989. 37. L.D. Whitley. The GENITOR algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. In J.D. Scha er, ed., Proceedings of the Third International Conference on Genetic Algotirhms. Morgan Kaufmann, 1989. 38. T.P. Caudell and C.P. Dolan. Parametric connectivity: training of constrained networks using genetic algorithms. in Proc. of the Third Int'l Conf. on Genetic Algorithms and Their Applications (J.D. Scha er, ed.), pp. 370-374, Morgan Kaufmann, San Mateo, CA, 1989. 39. M. Srivinas and L.M. Patnaik. Learning neural network weights using genetic algorithms - improving preformance by search-space reduction. in Proc. of 1991 IEEE International Joint Conference on Neural Networks (IJCNN'91 Singapore), vol. 3, pp. 2331-2336, IEEE Press, New York, 1991. 40. H. de Garis. Gennets: genetically programmed neural nets - using the genetic algorithm to train neural nets whose inputs and/or outputs vary in time. in Proc. of 1991 IEEE International Joint Conference on Neural Networks (IJCNN'91 Singapore), vol. 2, pp. 1391-1396, IEEE Press, New York, 1991. 41. A.P. Wieland. Evolving neural network controllers for unstable systems. in Proc. of 1991 IEEE International Joint Conference on Neural Networks (IJCNN'91 Seattle), vol. 2, pp. 667-673, IEEE Press, New York, 1991. 42. D.J. Janson and J.F. Frenzel. Application of genetic algorithms to the training of higher order neural networks. Journal of Systems Engineering, vol. 2, pp. 272-276, 1992. 43. D.J. Janson and J.F. Frenzel. Training product unit neural networks with genetic algorithms. IEEE Expert, vol. 8, no. 5, pp. 26-33, 1993. 44. J.J. Merelo, M. Paton, A. Canas, A. Prieto, and F. Moran. Optimization of a competitive learning neural network by genetic algorithms. Lecture Notes in Computer Science, Vol. 686, pp. 185-192, Springer-Verlag, 1993. 45. J.J. Merelo and A. Prieto. G-LVQ, a combination of genetic algorithms and LVQ. in Arti cial Neural Nets and Genetic Algorithms, D.W. Pearson, N.C. Steele and R.F. Albrecht, Edts., pp. 92-95, Springer-Verlag, ISBN 3-211-82692-0, 1995. 46. J.J. Merelo, A. Prieto, F. Moran, R. Marabini, and J.M. Carazo. Automatic classi cation of biological particles from electron-microscopy images using conventional and genetic-algorithm optimized learning vector quantization. Neural Processing Letters, vol.8, pp.55-65, 1998. 47. J.L. Bernier, J. Ortega, I. Rojas, and A. Prieto. Improving the tolerance of multilayer perceptrons by minimizing thestatistical sensitivity to weight deviations. Neurocomputing, Vol.31, No.1-4. pp.87-103, 2000. 48. J.L. Bernier, J. Ortega, E. Ros, I. Rojas, and A. Prieto. A new measurement of noise immunity and generalization ability for MLPs. International Journal of Neural Systems, Vol. 9, No. 6,pp.511-522, 1999.

49. J.L. Bernier, J. Ortega, E. Ros, I. Rojas, and A. Prieto. Obtaining Fault Tolerant Multilayer Perceptrons Using an Explicit Regularization. Neural Processing Letters, Vol. 12, No.2, pp. 107-113, 2000. 50. J.L. Bernier, J. Ortega, E. Ros, I. Rojas, and A. Prieto. A Quantitative Study of Fault Tolerance, Noise Immunity and Generalization Ability of MLPs. Neural Computation, Vol.12, pp.2941-2964, 2000. 51. D.J. Montana and L. Davis. Training feedforward neural networks using genetic algorithms. Proc. 11th Internat. Joint Conf. on Arti cial Intelligence, 762-767, 1989. 52. D.B. Fogel, L.J. Fogel, and V.W. Porto. Evolving neural networks. Biological Cybernetics, vol. 63, pp. 487-493, 1990. 53. P. Bartlett and T. Downs. Training a neural network with a genetic algorithm. Technical Report, Dept. of Elec Eng., Univ of Queensland, 1990. 54. V.W. Porto, D.B. Fogel, and L.J. Fogel. Alternative neural network training methods. IEEE Expert, vol. 10, no. 3, pp. 16-22, 1995. 55. D.B. Fogel, E.C. Wasson, and V.W. Porto. A step toward computer-assisted mammography using evolutionary programming and neural networks. Cancer Letters, vol. 119, no. 1, pp.93, 1997. 56. D.B. Fogel, E.C. Wasson, and E.M. Boughton. Evolving neural networks for detecting breast cancer. Cancer Letters, vol. 96, no. 1, pp. 49-53, 1995. 57. N. Saravanan and D.B. Fogel. Evolving neural control systems. IEEE Expert, vol. 10, no. 3, pp. 23-27, 1995. 58. K.S. Tang, C.Y. Chan, K.F. Man, and S. Kwong. Genetic structure for NN topology and weights optimization. in Proceedings of the 1st IEE/IEEE International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications (GALESIA'95), (Stevenage, England), pp. 250-255, IEE Conference Publication 414, 1995. 59. G.W. Greenwood. Training partially recurrent neural networks using evolutionary strategies. IEEE Transactions on Speech and Audio Processing, vol. 5, no. 2, pp. 192-194, 1997. 60. A.P. Topchy and O.A. Lebedko. Neural network training by means of cooperative evolutionary search. Nuclear Instruments and Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 389, no. 1-2, pp. 240-241, 1997. 61. M. Sarkar and B. Yegnanarayana. Evolutionary programming-based probabilistic neural networks construction technique. in Proceedings of the 1997 IEEE International Conference on Neural Networks. Part 1 (of 4), (Piscataway, NJ, USA), pp. 456-461, IEEE Press, 1997. 62. X. Yao and Y. Liu. Fast evolutionary programming. in Evolutionary Programming V: Proc. of the Fifth Annual Conference on Evolutionary Programming (L.J. Fogel, P.J. Angeline, and T. Back, eds.), (Cambridge, MA), pp. 451-460, The MIT Press, 1996. 63. X. Yao, G. Lin, and Y. Liu. An analysis of evolutionary algorithms based on neighbourhood and step sizes. in Evolutionary Programming VI: Proc. of the Sixth Annual Conference on Evolutionary Programming (P.J. Angeline, R.G. Reynolds, J.R. McDonnell and R. Eberhart, eds.), vol. 1213 of Lecture Notes in Computer Science, (Berlin), pp. 297-307, Springer-Verlag, 1997. 64. Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, 1992. 65. Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs , Third, Extended Edition. Springer-Verlag, 1996.

66. P.A. Castillo, J. Gonzalez, J.J. Merelo, V. Rivas, G. Romero, and A. Prieto. SAProp: Optimization of Multilayer Perceptron Parameters using Simulated Annealing. Lecture Notes in Computer Science, ISBN:3-540-66069-0, Vol. 1606, pp. 661-670, Springer-Verlag, 1999. 67. V.M. Rivas, P.A. Castillo, and J.J. Merelo. Evolving RBF Neural Networks. Accepted in IWANN'2001, 2001. 68. P.A. Castillo, J. Gonzalez, J.J. Merelo, V. Rivas, G. Romero, and A. Prieto. GProp-II: Global Optimization of Multilayer Perceptrons using GAs. In Congress on Evolutionary Computation, ISBN:0-7803-5536-9, Volume III, pp. 2022-2027, Washington D.C., USA, 1999. 69. P.A. Castillo, J. Gonzalez, J.J. Merelo, V. Rivas, G. Romero, and A. Prieto. GProp-III: Global Optimization of Multilayer Perceptrons using an Evolutionary Algorithm. In Congress on Evolutionary Computation,In Genetic and Evolutionary Computation Conference, ISBN:1-55860-611-4, Volume I, pp. 942, Orlando, USA, 1999. 70. P.A. Castillo, M.G. Arenas, J.G. Castellano, J. Carpio, J.J. Merelo, A. Prieto, V. Rivas, and G. Romero. Function Approximation with Evolved Multilayer Perceptrons. in Proc. of Int'l Workshop on Evolutionary Computation (IWEC'2000) pp. 209-224. State Key Laboratory of Software Engineering. Wuhan University, 2000. 71. P.A. Castillo, J. Carpio, J.J. Merelo, V. Rivas, G. Romero, and A. Prieto. Evolving Multilayer Perceptrons. Neural Processing Letters, vol. 12, no. 2, pp.115-127. October, 2000. 72. P.A. Castillo, J.J. Merelo, V. Rivas, G. Romero, and A. Prieto. G-Prop: Global Optimization of Multilayer Perceptrons using GAs. Neurocomputing, Vol.35/1-4, pp.149-163, 2000. 73. P.A. Castillo, M.G. Arenas, J.G. Castellano, M.Cillero, J.J. Merelo, A. Prieto, V. Rivas, and G. Romero. Function Approximation with Evolved Multilayer Perceptrons. Advances in Neural Networks and Applications. Arti cial Intelligence Series. Nikos E. Mastorakis Editor. ISBN:960-8052-26-2, pp.195-200, Published by World Scienti c and Engineering Society Press, 2001. 74. P.A. Castillo, J.G. Castellano, J.J. Merelo, and G. Romero. Lamarckian Evolution and Baldwin E ect in Arti cial Neural Networks Evolution. Submitted to 5th International Conference on Arti cial Evolution. Bourgogne, October 29-31, 2001. 75. P.A. Castillo, J.M. de la Torre, J.J. Merelo, and I. Roman. Forecasting business failure. A comparison of neural networks and logistic regression for the Spanish companies. Accepted in 24th Annual Congress European Accounting Association. Athens, 18-20 April, 2001. 76. J.L. Valderrabano, E. Madinaveitia, P.A. Castillo, and J.J. Merelo. NO PUBLICITARIA: Pueden los nuevos metodos TORIEDAD Y PRESION matematicos ayudarnos a mejorar la e cacia de la publicidad? Ponencias del 96 Seminario. 17 Seminario de Television, pp.149-159. AEDEMO. Jerez de la Frontera, 7 al 9 de Febrero, 2001. 77. D. Thierens, J. Suykens, J. Vandewalle, and B. De Moor. Genetic weight optimization of a feedforward neural network controller. In Proceedings of the Conference on Arti cial Neural Nets and Genetic Algorithms, pp. 658-663. Springer-Verlag, 1993. 78. N.J. Radcli e. Genetic set recombination and its applicaton to neural network topology optimization. Tech. Report EPCC-TR-91-21, University of Edinburgh, Edinburgh, Scotland, 1991.

79. P.J.B. Hancock. Coding strategies for genetic algorithms and neural nets. PhD thesis, Department of Computing Science and Mathematics, University of Stirling, 1992. 80. F.J. Marn and F. Sandoval. Genetic synthesis of discrete-time recurrent neural network. Lecture Notes in Computer Science, Vol. 686, pp.179-184, SpringerVerlag, 1993. 81. E. Alba, J.F. Aldana, and J.M. Troya. Fully Automatic ANN Design: A Genetic Approach. Lecture Notes in Computer Science, Vol. 686, pp. 399-404, SpringerVerlag, 1993. 82. W. Schi mann, M. Joost, and R. Werner. Synthesis and performance analysis of multilayer neural network architectures. Tech. Rep. 16/1992, University of Koblenz, Institute fur Physics, Rheinau 3-4, D-5400 Koblenz, 1992. 83. G.F. Miller, P.M Todd, and S.U. Hegde. Designing neural networks using genetic algorithms. In J.D.Scha er, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 379-384, San Mateo, 1989, 1989. 84. J.D. Scha er, R.A. Caruana, and L.J. Eshelman. Using genetic search to exploit the emergent behavior of neural networks. Physica D, vol. 42, pp. 244-248, 1990. 85. S. Roberts and M. Turega. Evolving neural networks: an evaluation of enconding techniques. in Arti cial Neural Nets and Genetic Algorithms, D.W. Pearson, N.C. Steele and R.F. Albrecht, Edts., pp. 96-99, Springer-Verlag, ISBN 3-211-82692-0, 1995. 86. J.R. Koza and J.P. Rice. Genetic generation of both the weights and architecture for a neural network. in Proc. of 1991 IEEE International Joint Conference on Neural Networks (IJCNN'91 Seattle), vol. 2, pp. 397-404, IEEE Press, ew York, 1991. 87. L. Mart. Genetically generated neural networks i: representational e ects. in Proc. of Int'l Joint Conf. on Neural Networks (IJCNN'92 Baltimore), Vol. IV, pp. 537-542, IEEE Press, New York NY 10017-2394, 1992. 88. D. White and P. Ligomenides. GANNet: A Genetic Algorithm for Optimizing Topology and Weights in Neural Network Design. Lecture Notes in Computer Science, Vol. 686, pp. 322-327, Springer-Verlag, 1993. 89. S.A. Harp, T. Samad, and A. Guha. Designing application-speci c neural networks using the genetic algorithm. in Advances in Neural Information Processing Systems 2 (D.S. Touretzky, ed.), pp. 447-454, Morgan Kaufmann, San Mateo, CA, 1990. 90. S.A. Harp, T. Samad, and A. Guha. Towards the genetic synthesis of neural networks. in Proc. of the Third Int'l Conf. on Genetic Algorithms and Their Applications (J.D. Scha er, ed.), pp. 360-369, Morgan Kaufmann, San Mateo, CA, 1989. 91. J.R. Koza. Genetic programming: On the programming of computers by means of natural selection. MIT Press, 1992. 92. J. Zhang and H. Muehlenbein. Genetic programming of minimal neural nets using occam's razor. in Proceedings of the 5th international conference on genetic algorithms (ICGA'93) (S. Forrest, ed.), pp.342-349, Morgan Kaufmann, 1993. 93. H. Kitano. Empirical studies on the speed of convergence of neural network training using genetic algorithms. in Proc. of the Eighth Nat'l Conf. on AI (AAAI90), pp. 789-795, MIT Press, Cambridge, MA, 1990. 94. X. Yao and Y. Shi. A preliminary study on designing arti cial neural networks using coevolution. in Proc. of the IEEE Singapore Int'l Conf. on Intelligent Control and Instrumentation, (Singapore), pp. 149-154, IEEE Singapore Section, 1995.

95. E. Vonk, L.C. Jain, and R. Johnson. Using genetic algorithms with grammar encoding to generate neural networks. in Proceedings of the 1995 IEEE International Conference on Neural Networks. Part 4 (of 6), (Piscataway, NJ, USA), pp. 1928-1931, IEEE Press, 1995. 96. F. Gruau. Genetic synthesis of boolean neural networks with a cell rewriting developmental process. in Proc. of the Int'l Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN'92) (D. Whitley and J.D. Scha er, eds.), pp. 55-74, IEEE Computer Society Press, Los Alamitos, CA, 1992. 97. F.C. Gruau. Cellular enconding of genetic neural networks. Technical Report, LIP-IMAG Ecole Normale Superieure de Lyon, 46 Allee d'Italie 69007 Lyon, France, 1992. 98. F. Gruau and D. Whitley. Adding learning to the cellular development of neural networks: Evolution and the Baldwin efect. Evolutionary Computation, Volume I, No. 3, pp. 213-233, 1993. 99. E. Mjolsness, D.H. Sharp, and B.K. Alpert. Scaling, machine learning, and genetic neural nets. Advances in Applied Mathematics, vol. 10, pp. 137-163, 1989. 100. R. Poli. Some steps towards a form of parallel distributed genetic programming. in Proceedings of the First On-line Workshop on Soft Computing, pp.290-295, 1996. 101. R. Poli. Discovery of symbolic, neuron-symbolic and neural networks with parallel distributed genetic programming. in 3rd International Conference on Arti cial Neural Networks and Genetic Algorithms (ICANNGA'97), pp.419-423. Norwich, 1997. 102. J.C. Figueira-Pujol and R. Poli. Evolving neural networks using a dual representation with a combined crossover operator. Proceedings of the 1998 (IEEE) World Congress on Computational Intelligence, pp.416-421. IEEE Press Publishers. Anchorage, Alaska, USA, 1998. 103. G. Thimm and E. Fiesler. Neural network initialization. Lecture Notes in Computer Science, Vol. 930, pp. 535-542, Springer-Verlag, 1995. 104. J.F. Kolen and J.B. Pollack. Back Propagation is Sensitive to Initial Conditions. Technical Report TR 90-JK-BPSIC. Laboratory for Arti cial Intelligence Research, Computer and Information Science Department, 1990. 105. S.E. Fahlman. Faster-learning variations of back-propagation: An empirical study. In D.S. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Connectionist Models Summer School, pp.38-51. Morgan Kaufmann, San Mateo, 1988. 106. L.Y. Bottou. Reconnaissance de la parole par reseaux multi-couches. In NeuroNimes'88. ISBN:2-906899-14-3, 1988. 107. E.J.W. Boers and H.Kuiper. Biological Metaphors and the Design of Modular Arti cial Neural Networks. Master's thesis, Leiden University, Leiden , The Netherlands, 1992. 108. F.J. Smieja. Hyperplane spin dynamics, network plasticity and back-propagation learning. GMD report, GMD, St. Augustin, Germany, 1991. 109. L.F.A. Wessels and E. Barnard. Avoiding false local minima by proper initialization of connections. IEEE Transactions on Neural Networks, vol. 3, num. 6, pp. 899-905, 1992. 110. D. Nguyen and B. Widrow. Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) San Diego, vol. III, pp. 21-26, Edward Brothers, 1990.

111. Y. Lee, S.H. Oh, and M.W. Kim. An Analysis of Premature Saturation in BackPropagation Learning. Neural Networks, vol. 6, pp. 719-728, 1993. 112. C.L. Chen and R.S. Nutter. Improving the training speed of three-layer feedforward neural nets by optimal estimation of the initial weights. In International Joint Conference on Neural Networks, vol. 3, pp. 2063-2068. IEEE, 1991. 113. T. Denoeux and R. Lengelle. Initializaing back propagation nerworks with prototypes. Neural Networks, vol. 6, pp. 351-363, Pergamon Press Ltd., 1993. 114. M. Kim and C. Choi. A new weight initialization method for the MLP with the BP in multiclass classi cation problems. Neural Processing Letters 6: 11-23, 1997. 115. D. Whitley. The GENITOR Algorithm and Selection Presure: Why rank-based allocation of reproductive trials is best. in J.D. Scha er (Ed.), Proceedings of The Third International Conference on Genetic Algorithms, Morgan Kau mann, Publishers, 116-121, 1989. 116. R.K. Belew, J. McInerney, and N.N. Schraudolph. Evolving networks: using genetic algorithm with connectionist learning. Tech. Rep. CS90-174 (Revised, Computer Science and Engr. Dept. (C-014), Univ. of California at San Diego, La Jolla, CA 92093, USA, 1991. 117. A.P. Topchy, O.A. Lebedko, and V.V. Miagkikh. Fast learning in multilayered neural networks by means of hybrid evolutionary and gradient algorithms. to appear in Proc. of IC on Evolutionary Computation and Its Applications, Moscow, 1996. 118. W. Kinnebrock. Accelerating the standard backpropagation method using a genetic approach. Neurocomputing, 6, 583-588, 1994. 119. P. Osmera. Optimization of neural networks by genetic algorithms. Neural Network World, vol. 5, no. 6, pp. 965-976, 1995. 120. B. Yoon, D.H. Holmes, and G. Langholz. EÆcient genetic algorithms for training layered feedforward neural networks. Information Sciences, vol. 76, no. 1-2, pp. 67-85, 1994. 121. M. Koeppen, M. Teunis, and B. Nickolay. Neural network that uses evolutionary learning. in Proceedings of the 1997 Internaational Conference on Evolutionary Computation, ICEC'97, (Piscataway, NJ, USA), pp. 1023-1028, IEEE Press, 1997. 122. J.J. Merelo, M. Paton, A. Canas, A. Prieto, and F. Moran. Genetic optimization of a multilayer neural network for cluster classi cation tasks. Neural Network World, vol.3, pp.175-186, 1993. 123. I. de Falco, A. Iazzetta, P. Natale, and E. Tarantino. Evolutionary Neural Networks for Nonlinear Dynamics Modeling. Lecture Notes in Computer Science, Vol. 1498, pp. 593-602, Springer-Verlag, 1998. 124. D.L. Prados. Training multilayered neural networks by replacing the least t hidden neurons. in Proc. of IEEE SOUTHEASTCON '92, vol. 2, pp. 634-637, IEEE Press, New York, NY, 1992. 125. D.L. Prados. New learning algorithm for training multilayered neurla networks that uses genetic-algorithm techniques. Electronics Letters, vol. 28, pp. 15601561, 1992. 126. J.V. Hansen and R.D. Meservy. Learning experiments with genetic optimization of a generalized regression neural network. Decision Support Systems, vol. 18, no. 3-4, pp. 317-325, 1996. 127. R.S. Sexton, R.E. Dorsey, and J.D. Johnson. Toward global optimization of neural networks: A comparison of the genetic algorithm and backpropagation. Decision Support Systems, vol. 22, no. 2, pp. 171-185, 1998.

128. D.H. Wolpert and W.G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 67-82, 1997. 129. S.W. Lee. O -line recognition of totally unconstrained handwritten numerals using multilayer cluster neural network. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 6, pp. 648-652, 1996. 130. S. Omatu and S. Deris. Stabilization of inverted pendulum by the genetic algorithm. in Proceedings of the 1996 IEEE Conference on Emerging Technologies and Factory Automation, ETFA'96. Part 1 (of 2), (Piscataway, NJ, USA), pp. 282-287, IEEE Press, 1996. 131. S. Omatu and M. Yoshioka. Self-tuning neuro-pid control and applications. in Proceedings of the 1997 IEEE Conference on Systems, Man and Cybernetics. Part 3 (of 5), (Piscataway, NJ, USA), pp. 1985-1989, IEEE Press, 1997. 132. I. Erkmen and A. Ozdogan. Shjort term load forecasting using genetically optimized neural network cascaded with a modi ed kohonen clustering process. in Proceedings of the 1997 IEEE International Symposium on Intelligent Control, (Piscataway, NJ, USA), pp. 107-112, IEEE Press, 1997. 133. A. Skinner and J.Q. Broughton. Neural networks in computational materials science: Training algorithms. Modelling and Simulation in Materials Science and Engineering, 3:371-390, 1995. 134. M. Land. Evolutionary algorithms with local search for combinatorial optimization. PhD thesis, Computer Science and Engr. Dept. - Univ. California. San Diego, 1998. 135. H. Kitano. Designing neural networks using genetic algorithms with graph generation system. Complex Systems, vol. 4, no. 4, pp. 461-476, 1990. 136. Y. Liu and X. Yao. A population-based learning algorithm which learns both architectures and weights of neural networks. Chinese Journal of Advanced Software Research (Allerton Press, Inc., New York, NY 10011), vol. 3, no. 1, pp. 54-65, 1996. 137. X. Yao and Y. Liu. Evolutionary arti cial neural networks that learn and generalise well. in 1996 IEEE International Conference on Neural Networks, Washington DC, USA, Volume on Plenary, Panel and Special Sessions, pp. 159-164, IEEE Press, New York, 1996. 138. X. Yao and Y. Liu. Ensemble structure of evolutionary arti cial neural networks. in Proc. of the 1996 IEEE Int'l Conf. on Evolutionary Computation (ICEC'96), Nagoya, Japan, pp. 659-664, IEEE Press, New York, NY 10017-2394, 1996. 139. Y. Liu and X. Yao. Evolutionary design of arti cial neural networks with di erent nodes. in Proc. of the 1996 IEEE Int'l Conf. on Evolutionary Computation (ICEC'96), Nagoya, Japan, pp. 670-675, IEEE Press, New York, NY 10017-2394, 1996. 140. X. Yao and Y. Liu. Evolving arti cial neural networks through evolutionary programming. in Evolutionary Programming V: Proc. of the Fifth Annual Conference on Evolutionary Programming (L.J. Fogel, P.J. Angeline, and T. Back, eds.), (Cambridge, MA), pp. 257-266, The MIT Press, 1996. 141. X. Yao and Y. Liu. A new evolutionary system for evolving arti cial neural networks. IEEE Transactions on Neural Networks, vol. 8, no. 3, pp. 694-713, 1997. 142. X. Yao and Y. Liu. Epnet for chaotic time-series prediction. in Selected Papers from the First Asia-Paci c Conference on Simulated Evolution and Learning (SEAL'96)(S. Yao, J.H. Kim and T. Furuhashi, eds.), vol. 1285 of Lecture Notes in Arti cial Intelligence, (Berlin), pp. 146-156, Springer-Verlag, 1997.

143. Xin Yao and Yong Liu. Towards Designing Arti cial Neural Networks by Evolution. Applied Mathematics and Computation, 91(1):83-90, 1998. 144. X. Yao and Y. Liu. Making use of population information in evolutionary arti cial neural networks. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 28, no. 3, pp. 417-425, 1998. 145. A. Ribert, E. Stocker, Y. Lecourtier, and A. Ennaji. Optimizing a Neural Network Architecture with an Adaptive Parameter Genetic Algorithm. Lecture Notes in Computer Science, Vol. 1240, pp. 527-535, Springer-Verlag, 1994. 146. I. de Falco, A. Della Cioppa, A. Iazzetta, P. Natale, and E. Tarantino. Optimizing Neural Networks for Time Series Prediction. Third World Conference on Soft Computing (WSC3), June 1998, 1998. 147. T. Ragg, S. Gutjahr, and H.M. Sa. Automatic determination of optimal network topologies based on information theory and evolution. in IEEE, Proceedings of the 23rd Euromicro Conference, Track on Computational Intelligence, pp.549-555, 1995. 148. G. Mani. Learning by gradient descent in function space. in Proc. of IEEE int'l Conf. on System, Man, and Cybernetics, (Los Angeles, CA), pp. 242-247, 1990. 149. D.R. Lovell and A.C. Tsoi. The performance of the neocognitron with various Scell and C-cell transfer functions. Intelligent Machenes Lab., Dept. of Elec. Eng., Univ. of Queensland, 1992. 150. B. DasGupta and G. Schnitger. EÆcient approximation with neural networks: a comparison of gate functions. tech. rep., Delp. of Computer Sci., Pennsylvania State Univ., University Park, PA 16802, 1992. 151. D.G. Stork, S. Walker, M. Burns, and B. Jackson. Preadaptation in neural circuits. in Proc. of Int'l Joint Conf. on Neural Networks, Vol. I, (Washington, DC), pp. 202-205, Lawrence Erlbaum Associates, Hillsdale, NJ, 1990. 152. D.B. Fogel, L.J. Fogel, and J.W. Atmar. Meta-evolutionary programming. In R.R. Chen, editor, Proceedings of 25th Asilomar Conference on Signals, Systems and Computers, pp. 540-545, Paci c Grove, California, 1991. 153. S. Nol , J.L. Elmann, and D. Parisi. Learning and evolution in neural networks. Tech. Rep. CRT-9019, Center for Research in Language, University of California, San Diego, La Jolla, CA 92093-0126, USA, 1990. 154. H. Muhlenbein. Adaptation in open systems: learning and evolution. in Workshop Konnektionismus (J. Kindermann and C. Lischka, eds.), pp. 122-130, GMD, Postfach 1240, D-5205 St., Augustin, Germany, 1988. 155. H. Muhlenbein and J. Kindermann. The dinamics of evolution and learning towards genetic neural networks. in Connectionism in Perspective (R. P fer et al., ed.), pp. 173-198, Elsevier Science Publishers B.V., Amsterdam, 1989. 156. J. Paredis. The evolution of behavior: some experiments. in Proc. of the First Int'l Conf. on Simulation of Adaptive Behavior: From Animals to Animats (J. Meyer and S.W. Wilson, eds.), MIT Press, Cambridge, MA, 1991. 157. D.J. Chalmers. The evolution of learning: an experiment in genetic connectionism. in Proceedings of the 1990 Connectionist Models Summer School (D.S. Touretzky, J.L. Elman, and G.E. Hinton, eds.), pp. 81-90, Morgan Kaufmann, San Mateo, CA, 1990. 158. Y. Bengio and S. Bengio. Learning a synaptic learning rule. Tech. Rep. 751, Department d'Informatique et de Recherche Operationelle, Universite de Montreal, Canada, 1990. 159. S. Bengio, Y. Bengio, J. Cloutier, and J. Gecsei. On the optimization of a synaptic learning rule. in Preprints of the Conference on Optimality in Arti cial and Biological Neural Networks, (Univ. of Texas, Dallas), 1992.

160. J.F. Fontanari and R. Meir. Evolving a learning algoritm for the binary perceptron. Network, vol. 2, pp. 353-359, 1991. 161. D.H. Ackley and M.S. Littman. Interactions between learning and evolution. in Arti cial Life II, SFI Studies in the Sciences of Complexity, vol. X (C.G. Langton, C. Taylor, J.D. Farmer, and S. Rasmussen, eds.), (Reading, MA), pp. 487-509, Addison-Wesley, 1991. 162. J. Baxter. The evolution of learning algorithms for arti cial neural networks. in Complex Systems (D. Green and T. Bossomaier, eds.), pp. 313-326, IOS Press, Amsterdam, 1992. 163. D. Crosher. The arti cial evolution of a generalized class of adaptive processes. in Preprints of AI'93 Workshop on Evolutionary Computation (X. Yao, ed.), pp. 18-36, 1993. 164. P. Turney, D. Whitley, and R. Anderson. Special issue on the baldwin e ect. Evolutionary Computation, vol. 4, no. 3, pp. 213-329, 1996. 165. H.B. Kim, S.H. Jung, T.G. Kim, and K.H. Park. Fast learning method for backpropagation neural network by evolutionary adaptation of learning rates. Neurocomputing, vol. 11, no. 1, pp. 101-106, 1996. 166. D. Patel. Using genetic algorithms to construct a network for nancial prediction. in Proceedings of SPIE: Applications of Arti cial Neural Networks in Image Processing, (Bellingham, WA, USA), pp. 204-213, Society of Photo-Optical Instrumentation Engineers, 1996. 167. A. Abraham and B. Nath. ALEC - An adaptive learning framework for optimizing arti cial neural networks. Computational Science, Springer-Verlag Germany, Vassil N Alexandrov et al. (editors), SanFrancisco, USA, pp.171-180, 2001. 168. A. Abraham. Optimization of evolutionary neural networks using hybrid learning algorithms. IEEE International Joint Conference on Neural Networks (IJCNN'02). IEEE World Congress on Computational Intelligence, Hawaii, Vol.3, pp.2792-2802, 2002. 169. G. Beliakov and A. Abraham. Global optimization of neural networks using deterministic hybrid approach. Hybrid information systems, Advances in Soft Computing, Physica Verlag, Germany, ISBN 3-7908-1480-6, pp.79-92, Australia, 2002. 170. T. Kohonen. Clustering, taxonomy, and topological maps of patterns. In Proc. of the 6th Int. Conf. on Pattern Recognition. IEEE Computer Society Press, 1982. 171. T. Kohonen. Self-organizing formation of topologically correct features maps. Biological Cybernetics, vol. 43, pp.59-69, 1982. 172. T. Kohonen. The Self-Organizing Map. Procs. IEEE, vol. 78, no.9, pp. 1464-1480, 1990. 173. T. Kohonen. Self-organizing maps. Segunda Edicion, Springer, 1997. 174. T.F. Cox and M.A.A. Cox. Multidimensional scaling. London: Chapman and Hall, 1994. 175. Z. Guo and R.E. Uhrig. Using genetic algorithms to select inputs for neural networks. in Proc. of the Int'l Workshop on Combinations of Genetic Algorithms and Neural Networks (COGANN-92)(D. Whitley and J.D. Scha er, eds.), pp. 223-234, IEEE Computer Society Press, Los Alamitos, CA, 1992. 176. F.Z. Brill, D.E. Brown, and W.N. Martin. Fast genetic selection of features for neural network classi ers. IEEE Transctions on Neural Networks, vol. 3, pp. 324-328, 1992. 177. L.S. Hsu and Z.B. Wu. Input pattern enconding through generalised adaptive search. in Proc. of the Int'l Workshop on Combinations of Genetic Algorithms

178. 179.

180.

181. 182. 183. 184. 185.

186.

187. 188. 189. 190. 191. 192. 193.

and Neural Networks (COGANN-92)(D. Whitley and J.D. Scha er, eds.), pp. 235-247, IEEE Computer Society Press, Los Alamitos, CA, 1992. J. Hornyak and L. Monostori. Feature extraction technique for ann-based nancial forecasting. Neural Network World, vol. 7, no. 4-5, pp. 543-552, 1997. F. Dellaert and J. Vandewalle. Automatic design of cellular neural networks by means of genetic algorithms: Finding a feature detector. in Proceedings of the IEEE International Workshop on Cellular Neural Networks and their Applications, (Piscataway, NJ, USA), pp. 189-194, IEEE Press, 1994. P.R. Weller, R. Summers, and A.C. Thompson. Using a genetic algorithm to evolve an optimum input set for a predictive neural network. in Proceedings of the 1st IEE/IEEE International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications (GALESIA'95), (Stevenage, England), pp. 256-258, IEE Conference Publication 414, 1995. U. Utrecht and K. Trint. Mutation operators for structure evolution of neural networks. In R. Maenner Y. Davidor, H.P. Schwefel, editors, Parallel Problem Solving from Nature, Workshop-Proceedings, pp. 492-501. Springer, 1994. P.J. Angeline, G.M. Saunders, and J.B. Pollack. An evolutionary algorithm that constructs recurrent neural networks. IEEE Transactions on Neural Networks, 5(1):54-65, 1994. W. Schi mann, M. Joost, and R. Werner. Performance evaluation of evolutionarily created neural network topologies. In H.P. Schwefel and R. Manner, editors, Parallel Problem Solving from Nature, pp. 274-283. Springer, 1990. D.E. Rumelhart and J.L. McClelland. Parallel Distributed Processing: Explorations in the Microstructures of Cognition. Cambridge, MA: MIT Press, 1986. P.J. Angeline. Evolving basis functions with dynamic receptive elds. in Proceedings of the 1997 IEEE International Conference on Systems, Man, and Cybernetics. Part 5 (of 5), (Piscataway, NJ, USA), pp. 4109-4114, IEEE Press, 1997. N. Karunanithi, R. Das, and D. Whitley. Genetic cascade learning for neural networks. In Scha er and Whitley, editors, Proceedings of the International Workshop on Combinations of Genetic Algorithms and Neural Networks, pp. 134-144, 1992. E. van Wanrooij. Evolving squential neural networks for time series forecasting. Master's thesis, Department of Computer Science, University of Utrecht, Netherlands, 1994. V.W. Porto and D.B. Fogel. Neural network techniques for navigation of auvs. Proceedings of the IEEE Symposium on Autonomous Underwater Vehicle Technology (pp. 137-141). Washington, DC: IEEE, 1990. S. Bornholdt and D. Graudenz. General asymmetric neural networks and structure design by genetic algorithms. Neural Networks, 5:327-334, 1992. H. Braun and P. Zagorski. Enzo-II - a powerful design tool to evolve multilayer feed forward networks. In Proceedings of the rst IEEE Conference on Evolutionary Computation, vol. 2, pp. 278-283, 1994. N.J. Radcli e. Equivalence class analysis of genetic algorithms. Complex Systems 5, no.2: 183-205, 1991. P.G. Korning. Training of neural networks by means of genetic algorithm working on very long chromosomes. Tech. Report, Computer Science Department, Aarhus C, Denmark, 1994. I. Bellido and G. Fernandez. Backpropagation Growing Networks: Towards Local Minima Elimination. Lecture Notes in Computer Science, Vol. 540, pp. 130-135, Springer-Verlag, 1991.

194. G. Bebis, M. Georgiopoulos, and T. Kasparis. Coupling weight elimination with genetic algorithms to reduce network size and preserve generalization. Neurocomputing 17 (1997) 167-194, 1997. 195. T. Jasic and H. Poh. Analysis of Pruning in Backpropagation Networks for Arti cial and Real World Mapping Problems. Lecture Notes in Computer Science, Vol. 930, pp. 239-245, Springer-Verlag, 1995. 196. M. Pelillo and A. Fanelli. A Method of Pruning Layered Feed-Forward Neural Networks. Lecture Notes in Computer Science, Vol. 686, pp. 278-283, SpringerVerlag, 1993. 197. H. Braun and J. Weisbrod. Evolving neural feedforward networks. In Proceedings of the Conference on Arti cial Neural Nets and Genetic Algorithms, pp. 25-32. Springer-Verlag, 1993. 198. J.B. Lamarck. Philosophie zoologique. 1809. 199. C.H. Waddington. Canalization of development and the inheritance of acquired characteristics. Nature, 3811, pp. 563-565, 1942. 200. M. Gorges-Schleuter. Asparagos96 and the traveling salesman problem. In Proceedings of 1997 IEEE International Conference on Evolutionary Computation, pp. 171-174. IEEE, 1997. 201. P. Merz and B. Freisleben. Genetic local search for the TSP: New results. In Proceedings of 1997 IEEE International Conference on Evolutionary Computation, pp. 159-163. IEEE, 1997. 202. B.J. Ross. A lamarckian evolution strategy for genetic algorithms. In Lance D. Chambers, editor, Practical Handbook of Genetic Algorithms: Complex Coding Systems, volume III, pp. 1-16. Boca Raton, FL: CRC Press, 1999. 203. D.H. Ackley and M. Littman. Interactions between learning and evolution. In C.G. Langton, C. Taylor, J.D. Garmer, and S. Rasmussen (Editors), Arti cial Life II, 487-507, Addison-Wesley, Reading, MA, 1992. 204. E.J.W. Boers, M.V. Borst, and I.G. Sprinkhuizen-Kuyper. Evolving Arti cial Neural Networks using the Baldwin E ect. In D.W. Pearson, N.C. Steele and R.F. Albrecht (eds.) and Arti cial Neural Nets and Genetic Algorithms. Proceedings of the International Conference in Ales, France, 333-336, Springer Verlag Wien New York, 1995. 205. M. Huesken, J.E. Gayko, and B. Sendho . Optimization for Problem Classes - Neural Networks that Learn to Learn. Proceedings of the First IEEE Symposium on Combinations of Evolutionary Computation and Neural Networks (ECNN 2000), IEEE Press, 2000. 206. F. Gruau and D. Whitley. Adding learning to the cellular development of neural networks: Evolution and the Baldwin efect. Evolutionary Computation, Volume I, No. 3, pp. 213-233, 1993. 207. D. Whitley, V.S. Gordon, and K. Mathias. Lamarckian Evolution, The Baldwin E ect and Function Optimization. Parallel Problem Solving from Nature-PPSN III. Y. Davidor, H.P. Schwefel and R. Manner, eds. pp. 6-15. Springer-Verlag, 1994. 208. K.W.C. Ku and M.W. Mak. Exploring the e ects of Lamarckian and Baldwinian learning in evolving recurrent neural networks. In Proceedings of 1997 IEEE International Conference on Evolutionary Computation, pp. 159-163. IEEE, 1997. 209. C. Houck, J.A. Joines, M.G. Kay, and J.R. Wilson. Empirical investigation of the bene ts of partial Lamarckianism. Evolutionary Computation, v.5, n.1, pp. 31-60, 1997.

210. B.A. Julstrom. Comparing Darwinian, Baldwinian and Lamarckian Search in a Genetic Algorithm for the 4-Cycle Problem. In Congress on Evolutionary Computation,In Genetic and Evolutionary Computation Conference, Late Breaking Papers, pp. 134-138, Orlando, USA, 1999. 211. M. Oliveira, J. Barreiros, E. Costa, and F. Pereira. LamBaDa: An Arti cial Environment to Study the Interaction beween Evolution and Learning. In Congress on Evolutionary Computation, Volume I, pp. 145-152, Washington D.C., USA, 1999. 212. J.G. Castellano, P.A. Castillo, and J.J. Merelo. Scrapping or recycling: the role of chromosome length-altering operators in Genetic Algorithms. Technical Report. GeNeura Group. Department of Architecture and Computer Technology. University of Granada, 2001. 213. I. Harvey and A. Thompon. Through the labyrinth evolution nds a way: A silicon ridge. In Proc. of the First International Conference on Evolvable Systems: From Biology to Hardware (ICES'96). Springer-Verlag, 1996. 214. I. Rojas, J.J. Merelo, H. Pomares, and A. Prieto. Genetic algorithms for optimum designing of fuzzy controllers. In Proceedings of the 2nd. Int. ICSC Symposium on Fuzzy Logic and Applications (ISFL'97), pp.165-170. International Computer Science Conventions, ICSC Academic Press, 1997. 215. A.S. Wu and R.K. Lindsay. Empirical studies of the genetic algorithm with non coding segments. Evolutionary Computation, 3(2), 1995. 216. P. Nordin and W. Banzhaf. Complexity compression and evolution. In Procs. of the 6th. International Conference on Genetic Algorithms, ICGA'95, pp. 310-317. Morgan Kaufmann, 1995. 217. S. Nol and D. Parisi. Growing neural networks. Technical Report PCIA-91-15, Institute of Psychology, CNR, Rome, 1991. Also in Proceedings of ALIFE III, 1992. 218. Teuvo Kohonen. The self-organizing map. Proc. IEEE, 78:1464{1480, 1990. 219. D.E. Rumelhart. Feature discovery by competitive learning. Cognitive Science (9), 1985. 220. D.E. Goldberg. Zen and the art of genetic algorithms. In Procs. of the 6th International Conference on Genetic Algorithms, ICGA'95, pp.80-85, 1995. 221. NewWave Intelligent Business Systems NIBS Inc. Genetica. Available from http://web.singnet.com.sg/ midaz/Nfga611.htm, 2000. 222. Statsoftinc. Statistica: Neural networks. Available from http://www.statsoftinc.com/stat nn.html, 2000. 223. A. Hunter. Trajan 4.0 neural network simulator. Available from http://www.trajan-software.demon.co.uk/commerce.htm, 2000. 224. M. Gronroos. INANNA. Available from http://inanna.sourceforge.net, 2000. 225. H. Braun and T. Ragg. Enzo evolutionary network optimizing system. Available from http://i11www.ira.uka.de/fagg, 2000. 226. M. Schoenauer, M. Keijzer, J.J. Merelo, and G. Romero. Eo: Evolving objects. Available from http://eodev.sourceforge.net, 2000. 227. D. Moriarty and R. Miikkulainen. Evolving Complex Othello Strategies Using Marker-Based Gnetic Encondign of Neural Networks. Technical Report AI93-206, 1993. 228. D. Moriarty and R. Miikkulainen. Evolving Complex Othello Strategies Through Evolutionary Neural Networks. Connection Science, vol.7, no.3-4, pp.195-209, 1995. 229. J.M. Alliot N. Durand and F. Medioni. Neural Nets trained by genetic algorithms for collision avoidance . Applied Arti cial Intelligence, Vol13, Number 3, 2000.

230. G. Romero, M.G. Arenas, J. Carpio, J.G. Castellano, P.A. Castillo, J.J. Merelo, A. Prieto, and V. Rivas. Evolutionary Computation Visualization: Application to G-Prop. Lecture Notes in Computer Science, ISBN:3-540-41056-2, Vol.1917, pp.902-912. September, 2000. 231. G. Romero, P.A. Castillo, J.J. Merelo, and A. Prieto. Using SOM for Neural Network Visualization. Accepted in IWANN'2001, 2001.

Suggest Documents