Hybrid Soft Computing Systems: A Critical Survey with Engineering

0 downloads 0 Views 490KB Size Report
neural networks, fuzzy logic, probabilistic reasoning, genetic algorithms and chaos ..... The fuzzi cation module is associated with the transfer of the input ..... functions is a tricky procedure as it sometimes embodies a number of free parameters.
Hybrid Soft Computing Systems: A Critical Survey with Engineering Applications Spyros G. Tzafestas and Konstantinos D. Blekas National Technical University of Athens Department of Electrical and Computer Engineering Zographou 157 73, Athens, Greece  email: [email protected] Abstract

During the last decade the human behaviour and human imitating processing methods have become of central interest through the scienti c community. The development of methods that mimic the human learning process being able to solve complex engineering problems which are dicult to deal with via conventional approaches, seems to be on an immediate emergency. Concepts such as nervous system, fuzziness and evolution come directly from human resources enclosing attractive properties and reach theory, and as a consequence lead to new scienti c horizons. In this direction, soft computing indicates a new family of computing techniques that accommodate human computing resources and make them being utilized. Neural networks, fuzzy systems and genetic algorithms are mainly the three basic constituents that contribute to this juncture. Starting with the basic features in each one of these partners, this paper is focused on the examination of all the possible combined (hybrid) methods among these units providing their main characteristics under a critical aspect. Moreover, a variety of engineering applications is presented demonstrating the enormous eld of action that soft computing surrounds, as well as proving the importance of dealing with hybrid intelligent methods.

Keywords: Neural Networks, Fuzzy Systems, Genetic Algorithms, Hybrid Soft Computing Systems, Neuro-Fuzzy, Fuzzy-Genetic, Neuro-Genetic.

1 Introduction Soft computing is an area of computing allowing imprecision, uncertainty and partial truth to process and therefore achieves robustness and low solution cost. The main characteristic Correspondence should be sent to: Professor S. G. Tzafestas, Intelligent Robotics and Automation Laboratory. 

1

is that it focuses on the model of human mind. Soft computing contains many elds such as neural networks, fuzzy logic, probabilistic reasoning, genetic algorithms and chaos theory, that may be seen as complementary. Among the elds accommodated to the web of soft computing, neural networks, fuzzy systems and genetic algorithms are on top of preferences in scienti c research. Each of these three elds bears very powerful properties and o ers di erent advantages. Speci cally,  Neural networks allow a system to learn.  Fuzzy logic allows the integration of expert knowledge into a system very easily.  Genetic algorithms enable a system to be self-optimizing. By combining them we can build advanced hybrid systems to solve complex problems. Hybrid soft computing approaches incorporate all the features from individual elds and moreover have the ability to overcome diculties and limitations that characterize each eld. The possible hybrid approaches are: neural-fuzzy, fuzzy-genetic, neural-genetic and neural-fuzzy-genetic systems. The use of intelligent hybrid systems is growing rapidly with successful applications in many areas including process control, robotics, manufacturing, engineering design, communication systems, nancial trading, credit evaluation, medical diagnosis, and cognitive simulation. The wide variety of applications pictures the "scienti c greediness" for these subjects. The objective of this paper is to present hybrid approaches in the eld of soft computing concerning neural networks, fuzzy systems and genetic algorithms. We will try to examine the existing combinations in terms of methodologies, architectures and applications. The guiding principle is to to see how each eld is in uenced by the others under a critical view, as well as to distinguish and separate each hybrid approach into general categories. Neural networks, fuzzy logic and genetic algorithms and their theoretical aspects are brie y discussed in Sections 2, 3 and 4, respectively. Section 5 deals with general issues and basic properties of hybrid methods. Fuzzy-neural approaches are described in Section 6 including applications to control and pattern recognition. Section 7 deals with fuzzy-genetic schemes, while Section 8 presents the design of neural networks using genetic algorithms. Finally some concluding remarks are provided in Section 9.

2 Neural Networks Arti cial neural systems or neural networks can be considered as a massively parallel distributed model that has a natural propensity for storing experimental knowledge and making it available for use. They represent mathematical models of brain-like systems where knowledge is received through a learning process. Looking backward, the origins of neural networks can be found in the work of McCulloch and Pitts (1943) [74], where a simple model of neuron as a binary threshold unit was proposed. The next step was Hebb's book (1949): The Organization of Behaviour [35] in 2

x1

wi1

x2

wi2

ui



xN

Activation function

g

Output

yi

threshold i

wiN

Figure 1: Basic neuron model which he introduced a signi cant learning rule based on psychological behaviour. Also, a new approach to the pattern recognition problem was introduced by Rosenblatt (1958) in his work on the perceptron [85], while Widrow and Ho (1960) introduced the least mean-square (LMS) algorithm [119]. The basic processing unit of a neural network is the neuron. As illustrated in Fig. 1 a neuron i consists of a set of N connecting links, the synapses, that are characterized by weights wij . Each input signal xj applied to the synapse j is multiplied by its corresponding weight wij and transmitted to neuron i. All these synapse products are accumulated by the adder as expressed by the equation:

ui =

N X j =1

wij xj

(1)

The above rule, Eq. 1, constitutes a linear combiner of input signals. Finally, an activation function g () provides the output yi of the unit as:

yi = g(ui ? i)

(2)

where i denotes the threshold which is an external parameter of neuron i and is used to apply an ane tranformation of the net input to the output. The activation function, also known as squashing function, de nes the output of the neuron. The most common type of this function is the sigmoid function. The logistic function: g(x) = ? x is an example of the sigmoid. Another frequently used sigmoid function is the hyperbolic tangent function, de ned as: g(x) = tanh( x ) = ? ??xx A neural network is characterized by its architecture and the learning algorithm used to train it. 1 1+exp(

2

3

)

1 exp( 1+exp(

) )

Input layer Hidden layer Output layer

Figure 2: A fully connected feedforward neural network with one hidden layer

2.1 Network architectures

There are two general types of neural network architectures: feedforward and recurrent. A feedforward network is organized in the form of layers. It consists of the input layer, the hidden layers and the output layer. The input pattern is transmitted from the input layer to the next layer, the rst hidden layer. After that, the output signal of each layer is used as the input signal to the next layer, until the output signal of the output (last) layer is computed. The values of the output nodes constitute the overall response of the network to the input pattern applied. The number of hidden layers may be greater than or equal to zero. A network with no hidden layer is called single layer feedforward network. A feedforward network can be fully or partially connected depending on the existence or not of all the connections between the nodes of each layer and the forward adjacent layer. An example of a fully connected feedforward neural network with one hidden layer is shown in Fig. 2. An example of feedforward network is the multilayer perceptron (MLP), where the input signal propagates through the network in a feedforward direction, in a layer-by-layer fashion. The other general class of network architectures are recurrent architectures. A recurrent network di ers from the feedforward type in that it has one or more feedback loops, in the sense that each neuron feeds its output signal back to the inputs of all the other neurons. The most commonly used recurrent neural networks are the Hop eld network and the Boltzmann machine. The Hop eld network [43] is a recurrent network operating as a nonlinear associative memory, where a pattern stored in memory is retrieved given an incomplete or noisy form of that pattern. The Hop eld neural network may be discrete (discrete input-discrete output) or analog (analog input-analog output). The discrete Hop eld network performs local search in the discrete space f0; 1gn . The energy function that corresponds to a discrete Hop eld neural network with n units, connection weights wij (with wii = 0) and threshold values i has the form: n n n X X X 1 E (~y) = ? 2 (3) wij yiyj ? yii i=1 j =1

4

i=1

where ~y = (y ; : : :; yn ) is the state of the network and yi 2 f0; 1g. The network operates sequentially, that is, at each time instant one unit is selected randomly and the di erence in the network's energy, that will result if the selected unit i changes state, is computed. Assuming symmetrical weights (wij = wji) this energy di erence can be written: 1

Ei(~y) = (2yi ? 1)(

n X j =1

wij yj + i)

(4)

If Ei(~y) < 0, then the change is accepted, otherwise it is rejected. For symmetrical weights it is ensured that the network will settle into a state corresponding to a local minimum of the energy function[43], where Ei(~y) > 0 for all i = 1; : : : ; n. The analog Hop eld neural network is a fully connected, continuous time network, that employs units with memory and analog output. It performs local search in the continuous space inside the hybercube [0; 1]n. Again we consider a network with n units, connection weights wij , where wii = 0 and wij = wji, and threshold values i (i; j = 1; : : : ; n). By ui and yi we denote the input and the output of unit i, respectively. The evolution of the behaviour of each unit is described by the following equations: dui = ?u + X w y +  (5) ij j i i dt j yi = g(ui) (6) where g() is a di erentiable, monotonically increasing function with values in (0; 1) (or in (?1; 1)). The energy function: n Z yi n n n X X X X 1 g? (x)dx E (~y) = ? 2 wij yiyj ? iyi +  i i i j 1

=1

=1 =1

=1

0

(7)

constitutes a Liapunov function for the system. This function decreases during the operation of the network. As a result, the analog Hop eld network converges to an equilibrium state that corresponds to a local minimum of the energy function. The main characteristic of the analog Hop eld neural network is that it can be easily implemented with resistors and operational ampli ers [44] and constitutes an analog machine capable of providing solutions to optimization problems. The Boltzmann machine [2, 39] represents a generalization of the discrete Hop eld network, where the neurons are stochastic and operate using the Boltzmann distribution. During the activation phase of this network a neuron is randomly chosen and ips its state according to a probability: Prob(yi ! ?yi) = 1 + exp(?1 E =T ) (8) j where Ej describes the energy change from such a ip. 5

2.2 Learning

Following a general framework, the learning process in a neural network implies the adjustment of the weights of the network in an attempt to minimize an error function suitable for the type of the network used. Considering that wij (t) denotes the synaptic weight at time t and that an adjustment wij (t) is applied to this weight at the same time, the updated value wij (t + 1) indicates the new weight at the next processing time. The above statement can be formally written as: wij (t + 1) = wij (t) + wij (t) (9) There are three basic classes of learning algorithms: supervised learning, unsupervised learning and reinforcement learning algorithms. During supervised learning an external teacher is provided to the network that holds the knowledge of the correct output for each input pattern assigned to the network. The network's actual response, yi(t), and the corresponding desired output value di (t) allow the computation of an error ei(t) = di (t) ? yi(t). This error value is used to adjust the weights of the network. The convergence criterion of a supervised learning algorithm is the minimization of some error function for the inputs used for the network training. The back-propagation learning algorithm [116] is an example of supervised algorithm, implementing gradient descent in weight space. In this case, the adjustment wij (t) is de ned by the delta rule: dE (t) wij (t) = ? dw (10) (t) ij

where the quantinty  is the learning parameter of the back-propagation algorithm. The error function E (t) characterizes the learning performance of the network at time t in terms of the sum of squared errors for all the training patterns: X X (11) E (t) = 21 ei(t) = 21 jjdi(t) ? yi(t)jj i i Di erentiating the error function with respect to the weights the following equation is obtained: dE (t) = dE (t) dyi (t) dui(t) (12) dwij (t) dyi(t) dui(t) dwij (t) dyi(t) x (t) = ?ei(t) du (13) (t) j 2

2

i

where ui determines the net internal activity produced at the input of the nonlinearity associated with neuron i (Eq. 1). Thus, Eqs. 10 and 13 give the updating rule for the weights. In unsupervised learning no knowledge from a teacher is available and the network must perform a kind of self-organized learning. As the network is not aware of the desired output values, it examines the input patterns according to some local measurements of similarity 6

or degrees of quality achieving a division of the input set into a number of self-tuned groups. The error function describes such kinds of measurements (for example distance measures) for all inputs into the groups and the objective is to minimize it in order to generate groups of patterns having similar properties. Under this constraint the network free parameters are optimized. The created groups may be further seen as new classes performing an input-output mapping. The competitive learning rule is a kind of unsupervised learning that performs a competition among the output neurons of a network with the result that only one output neuron is activated ( res) at each time. A special and increasingly popular class of neural networks based on competitive learning is the class of self-organizing feature maps [58], that are characterized by a topographic map of the input patterns such that the coordinates of the neurons upon a lattice correspond to features of the input patterns. It must be noted, that competitive learning can be also used as supervised learning. Reinforcement learning [95, 120] proposes an alternative learning process that di ers from the previous ones. The basic feature of reinforcement learning is that the performance of a learning system is evaluated through the use of a scalar reinforcement signal (taking values in the range [0; 1]) provided by the environment. During the learning process the network parameters are adjusted in an attempt to allow the appropriate action selection for each input signal as evaluated by the reinforcement signal, i.e. a reward signal will indicate a positive reinforcement signal, while, in the case of penalty, learning will cause the action in the next input trial not to be selected again. There are two classes of reinforcement learning: immediate reinforcement tasks and delayed reinforcement tasks. In the rst type of task the reinforcement signal is provoked by the most recent input-output pair, while in delayed reinforcement tasks the signal is obtained after a number of operation steps.

3 Fuzzy Systems Fuzzy-set theory was rst introduced by Zadeh in 1965 [125] presenting the notion of fuzzy set. A fuzzy set is a set without a crisp boundary, that is the decision whether an element x belongs or not to a set A is gradual and not crisp (binary). If X suggests a collection of objects denoted by x, then a fuzzy set A in X is de ned by a set of ordered pairs: A = f (x,A(x)) j x 2 X g (14) The function A(x) is called membership function of the object x in A. The membership function represents a "degree of belongingness" for each object to a fuzzy set, and provides a mapping of objects to a continuous membership value in the interval [0; 1]. When membership value is close to the value 1 (A (x) ! 1) it means that input x belongs to the set A with a high degree, while small membership values (A (x) ! 0) indicate that set A does not suit input x very well. Obviously, if the value of the membership function is restricted to binary values (0 or 1), then the fuzzy set A is reduced to a classical set with the function A (x) playing the role of characteristic function.

7

Like classical sets, fuzzy sets maintain the operators of interesection, union and complement. These three operators are the most important and widely used and are analogous to the operators AND, OR and NOT of classical logic. The intersection of two fuzzy sets A and B is a fuzzy set C , written as C = A \ B or C = A AND B , whose membership function is de ned as:

C (x) = min(A(x); B (x)) = A(x) ^ B (x)

(15)

The union of two fuzzy sets A and B is a fuzzy set C , written as C = A [ B or C = A OR B , having the following formulation of membership function:

C (x) = max(A(x); B (x)) = A (x) _ B (x)

(16)

Finally, the complement operator NOT is de ned as

A (x) = 1:0 ? A (x)

(17)

Several types of membership function can be used. According to Zadeh, membership is the quanti cation of the human perception about the situation at hand. The form of membership function is dependent on the structure of the corresponding fuzzy set. Some known forms are the triangular, the trapezoidal, the Gaussian and the bell form. Examples can be viewed in Fig. 3.

3.1 Fuzzy rules

In its traditional rule-based formulation, an expert system is represented by a sequence of rules that describe the behaviour of a natural system. The expert rules are based on the classical logic and set theory following the next consideration scheme: IF condition A AND condition B THEN action C The sets A and B are classical sets that de ne input states of the system, while set C denotes the output state or signal. Expert rules use the classical logical operators AND, OR and NOT connecting the linguistic variables. A set of di erent such rules constitutes an expert rule bank that behaves as the expert knowledge used to describe the operations and the control procedures of the system [104]. In many applications, most systems are considerably more complex than a simple list of rules and the decision making process is not a simple application of these rules. The idea of using fuzzy logic in a rule based system has been initially proposed by Mamdani [73, 72]. Considering the sets in the above rule as fuzzy sets, we can incorporate the fuzzy set theory to an expert system. In a fuzzy if-then rule the if-part is called the antecedent or premise, while the then-part is the consequence. A fuzzy rule set consists of a set of fuzzy if-then rules that are used to describe a system. To derive conclusions from a fuzzy rule set an inference procedure must be determined which is called fuzzy reasoning or approximate reasoning. 8

triangular

trapezodial

1

1

0

0

x

x

bell

Gaussian

1

1

0

0

x

x

Figure 3: Examples of membership function

3.2 Fuzzy Models

The Fuzzy inference system is a computing framework based on the concepts of fuzzy set theory, fuzzy if-then rules and fuzzy reasoning. It has been successfully applied to several elds such as automatic control, data classi cation, decision analysis and expert systems. A fuzzy inference system is also referred to as "fuzzy-rule based system", "fuzzy expert system" [52, 103], "fuzzy model" [93, 98], "fuzzy logic controller" [66, 67, 73], or simply "fuzzy system". The main andvantage of fuzzy reasoning is the ability to provide imprecision to the available knowledge thus causing the decision making procedure to be more exible. The general scheme of fuzzy expert systems can be seen in Fig. 4. It consists of four parts: the fuzzi cation module, the defuzzi cation module, the inference engine and the knowledge base. The fuzzi cation module is associated with the transfer of the input signal from the crisp to the fuzzy representation world, while the defuzzi cation procedure converts the distributed fuzzy logic values into a single point solution value, that will constitute the output of the fuzzy expert system. The knowledge base part comprises the information given by the process operator in form of linguistic control rules, and nally an inference system makes inference by means of reasoning methods. The defuzzi cation is a procedure of crucial importance for fuzzy systems because of its direct e ect upon the system's performance. There are two most often used defuzzi cation 9

Knowledge Base (Fuzzy Rules) Fuzzi cation Module

Fuzzy Inference Engine

Defuzzi cation Module

Controlled System

Figure 4: General fuzzy expert scheme methods. The rst is the Center of Area (COA) method that de nes the defuzzi ed value of a fuzzy set A (derived from the fuzzy operators) as its fuzzy centroid: R  (y)ydy yCOA = RY A (y)dy (18) Y A This calculation can be simpli ed if we consider a discrete membership function: Pn  (y )y yCOA = Pj n A (yj ) j (19) A j j =1

=1

Another defuzzi cation method is the Mean of Maxima (MOM) that determines the value as a mean of all values of the universe of discource, having maximal membership grades: X (20) yMOM = 1q yj j 2J

where J is the set of elements of the universe Y which attain the maximum membership value and q is the cardinality of set J . Other more exible defuzzi cation methods can be found in [121, 122], and more recent in [51, 86]. The most commonly used fuzzy inference systems are the Mamdani Model and the Sugeno Model. The Mamdani fuzzy model [73] was proposed as the rst e ort for fuzzy control systems. Each fuzzy rule i that can be written as: if x is Ai then y is Bi expresses a fuzzy relation Ci which is represented as a fuzzy intersection of the fuzzy sets Ai and Bi (Ci = Ai [ Bi). The Ci has membership function:

Ci (x; y) = Ai (x) ^ Bi (y) 10

(21)

Y B (y) i

C (x; y) i

A (x) i

X

Figure 5: Fuzzy relation Ci and its joint possibility distribution From the above expression it is shown that the fuzzy relation Ci corresponds to a rectangular region in the Cartesian product space X  Y (Fig. 5) with a joint possibility distribution as given by Ci (x; y). The Mamdani model suggests that the contribution of a set of m fuzzy rules has the result of constructing a fuzzy relation C accomplished via the union of the individual fuzzy relations Ci (C = [mi Ci). The membership function of this relation is: =1

C (x; y) = _mi Ci (x; y) = _mi (Ai (x) ^ Bi (y))

(22)

=1

=1

By considering the fuzzy operators AND and OR as the max and min functions respectively, the above membership function can be seen as a max ? min composition of the fuzzy input-output sets. Thus, using the COA defuzzi cation method the resultant fuzzy relation C is converted to a crisp output value. The Sugeno fuzzy model (known also as TSK / Takagi-Sugeno-Kang model) was proposed by Takagi, Sugeno and Kang [93, 98]. In the original formulation this model develops a systematic approach to generating fuzzy rules from a given data set. Typically, a fuzzy rule in this model has the following form: if x is A and : : : and xN is AN then y = f (x ; : : :; xN ) where Ak , k = 1; : : : ; N represent the fuzzy antecedent labels (fuzzy input sets). The big di erence in this model is the functional type of consequence instead of the fuzzy consequence used in the Mamdani model. Usually the function f is a linear function of the input variables xk , i.e. f (x ; : : :; xN ) = p x + : : : + pN xN + p . Considering a set of m fuzzy rules, the inference model computes a crisp output y which is the weighted average of the individual crisp outputs yi (i = 1; : : : ; m): m m X X y = Pwmiyiw = Pmwi w (pi x + : : : + piN xN + pi ) (23) j i j j j i 1

1

1

1

1

1

0

1

=1

=1

=1

1

0

=1

where wi denotes the strength of the ith rule (wi = Ai1 (x ) ^ : : : ^ AiN (xN )). 1

11

In a geometrical view, the set of rules in the Sugeno model gives an approximation of the mapping X  : : :  XN ! Y by a piecewise linear function. (In the general case these functions may be nonlinear as well). The Sugeno fuzzy model o ers a great advantage for describing complex control systems. It allows the decomposition of a system into simpler subsystems, and moreover the partitioning of the input space [114, 115]. 1

4 Genetic Algorithms Genetic algorithms in their simple form constitute the rst population-based optimization method as proposed by Holland (1975) [40]. They behave as a computational analog of adaptive systems by representing a general-purpose search algorithm that uses principles from natural population genetics to evolve solutions to problems. There are many variations of the basic approach [28, 31, 76]. In their traditional formulation, a random population of binary strings is assumed that is evolved through genetic steps of natural mechanisms. Each genetic structure of the population, which is called a chromosome, represents an individual solution to the mentioned problem. More recent implementations consider chromosomes encoding oating point numbers and arrays of integers. The chromosome tting capabilities are evaluated by an appropriate tness function, determining their matching power during the competition process. At each generation step new members of the population are created by applying genetic operators, such as crossover and mutation to appropriately selected strings. Genetic algorithms belongs to the general area of Evolutionary Computation. The other categories also included are: evolution strategies (ES) proposed by Rechenberg and Schwefel [3] and genetic programming that was introduced by Koza [62]. Genetic algorithms are suited to the needs of a large family of problems. They can use di erent data structures representing individuals, problem-speci c genetic operators for evolving individuals and methods for creating the initial population [108]. The theoretical foundations of genetic algorithms rely on the notion of schemata [31, 40] that propose templates enabling exploration of similarities among chromosomes.

4.1 Genetic operators

During the genetic procedure some operators must be speci ed that will be capable of recombination and creation of chromosome populations. In general, there is no rule prescribing how and when operators must be employed and genetic algorithms appear very robust on the selection of operators. As a consequence they allow the application of new genetic operators that suit precisely a given problem. The features of new operators are determined based on the application format and the problem representation design. Nevertheless, the general scheme of genetic algorithms concerns three basic operators that are most commonly used and represent the origins of natural adaptation of adaptive systems. These are selection, crossover and mutation. 12

parents

crossover point

children

Figure 6: Single-point crossover The most commonly used string selection scheme follows the principle of survival of the ttest, i.e., considering maximization problems, strings are selected for reproduction with probability proportional to their corresponding tness (function value). The above procedure maps the whole population onto a roulette wheel, where each individual corresponds to a tness-interval. Alternatively, a random selection scheme can be applied in which individuals are randomly selected for reproduction. The crossover operator is responsible for the recombination of the selected strings. Usually two parents are considered for recombination, although even the whole population can participate in the generation of one string [1]. The two parents can be combined in a variety of ways (sometimes depending on the characteristics of the function to be optimized), the single-point crossover and the double-point crossover being the most commonly used. In the single-point crossover a position on the chromosome (crossover point) is randomly selected and the two parents interchange their values starting from the selected point. On the other hand, during double-point crossover a second point is selected and the exchange takes place between the two crossover points. Fig. 6 illustrates an example of single-point crossover. In addition, the basic genetic algorithm employs a mutation operator which introduces randomness in the search process. Mutation randomly ips some bit values in the population strings according to a given mutation probability.

4.2 Genetic search

Viewed as a population of point-based optimizers, the traditional genetic algorithm performs a kind of parallel recombinative random search, in the sense that each population member can be considered as a point-based random optimizer that performs pure random search using the mutation operator. Of course the main search task is accomplished through recombination of these naive optimizers using the crossover operator, but the simplicity of the mutation operator reduces the e ectiveness of the algorithm in performing local search. The basic steps of the simple genetic algorithm are the following: Initialization Phase (t=0): Create a random population P (0) of parent strings. Evalu-

13

ate P (0). Iteration:

   

Select two parent strings from P (t). Combine these strings by applying crossover and mutation operators. Evaluate the produced children by computing the tness values. t = t + 1. Continue until a terminating condition is met. In order to apply a genetic algorithm for solving a speci c problem a number of considerations are in place. The rst one concerns the genetic representation scheme of the solution which determines the chromosome structure. This can be done in a form of a binary or real number string. The representation scheme is a very critical issue and plays a signi cant role on the searching ability of genetic algorithms. Another very important issue is the form of the tness function, which evaluates the performance of each individual. In general, tness must embody the necessary information for the procedure to perform the appropriate genetic search leading to optimal solutions, and must re ect the quality of individuals in terms of solving the considered problem. Finally, genetic algorithms involve a number of control parameters, such as population size, probabilities of crossover and mutation, number of crossover points, etc., the values of which play a signi cant role in the genetic evolutionary mechanism. Additionally, the form of the genetic operators (e.g. crossover and mutation) as well as the possible creation of new ones (as suggested by the representation scheme) must be considered in order to allow the creation of good individuals. The main problem with simple genetic algorithms is that they exhibit a fast convergence behaviour, mainly due to the e ects of the selection scheme which is biased towards strings having high function values and the crossover operator which cannot reintroduce diversity. The crossover between nearly identical strings provides strings similar to their parents. Population diversity can be introduced only through mutation but its e ectiveness is rather limited. Therefore, this property of gradual decrease of diversity in the population limits the sustained exploration capabilities of simple genetic algorithms, since it inhibits continuing search which is necessary in solving dicult problems.

5 Hybrid Soft Computing Approaches In the previous sections we brie y examined the three major elds of computational intelligence. Each of them has particular computational properties that makes it suitable for solving a large family of problems. Nevertheless, they also have some restrictions and drawbacks which do not allow their individual application in some cases. 14

NN

NF

FS

FG

GA

NG

NF: Neuro-Fuzzy

NN: Neural Networks

FG Fuzzy-Genetic

FS: Fuzzy Systems GA: Genetic Algorithms

NG: Neuro-Genetic

Figure 7: Hybrid approaches Neural networks propose a very attractive method of pattern recognition, but they are not good at explaining how they reach their decision. They can be seen as a black box where the extraction of knowledge from the resulting trainable network is a dicult task. In addition, the incorporation of external knowledge about a speci c problem into a neural network is very dicult. Another drawback in the case of neural networks is that in general we are not able to know the exact form of the network architecture and consequently the network structure is determined by an ad-hoc design procedure or experimental process. On the other hand, fuzzy logic can explain the operating behaviour of a system in terms of rules with the advantange of not requiring the availability of precise information. When expert knowledge is not available the applicability of fuzzy systems may be restricted. Also, there are some dicult issues, including the exact division of the input and output space into fuzzy sets, the values of the parameters of the membership functions and the precise number of fuzzy rules that reduce the "fuzzy logic glorious power". Also, genetic algorithms, through their parallel Darwinian, nature o er an attractive powerful tool. What makes them appropriate for many problems is their easiness of use as well as their free manner of representation. All these reasons have led to the development of methods combining the three main techniques in an attempt to cope with the above drawbacks. Neural networks (NN), fuzzy systems (FS) and genetic algorithms (GA) may be seen as equivalent methods in terms of their ability to be applied to many situations. Hybrid soft computing systems contain an aggregation of any two of these approaches, sometimes of all of them, where each one contributes e ectively to the design and performance capability of the hybrid approach. 15

Moreover, hybrid systems illustrate new theoretical aspects concerning each of the three individual elds, and manage to open new roads and dimensions of applications, as well as to discover their limitations and potentials. Therefore, by combining neural networks and fuzzy systems a new family of hybrid techniques is introduced, neurofuzzy systems (NF), that can be applied to solve FS problems. On the other hand, genetic algorithms can be used both in fuzzy and neural environments establishing two new approaches: the fuzzy genetic (FG) and the neural genetic (NG). All these approaches will be described in the next sections. A block diagram of hybrid systems can be seen in Fig. 7 in sigle layer connections. Later, we will extend it to a multilayer structure.

6 Neurofuzzy Systems Existing fuzzy reasoning techniques su er from the following de ciencies [97]:  The lack of a de nite method to determine membership functions.  The lack of a learning capability or adaptability which can be overcome by neural network driven fuzzy reasoning. Neural networks are used to tune the membership functions of fuzzy systems that are employed for controlling equipment. Although fuzzy logic has the ability to convert expert knowledge directly into fuzzy rules, it usually takes a lot of time to design and adjust the linguistic labels (fuzzy sets) of the problem. In addition, the tuning of membership functions is a tricky procedure as it sometimes embodies a number of free parameters that must be assigned by an expert. Neural network techniques can automate this design procedure improving the performance and reducing the computational time (trial-and-error waste time) [83, 106, 109]. In what follows, we examine the synergistic behaviour of neural networks and fuzzy systems under the prism of three general implementations. The rst one concerns the direct employment of a fuzzy system in a feedforward neural network to construct a fuzzy logic controller. This technique can be seen as a method of automatically constructing fuzzy rules from numerical data. The second approach represents a general scheme of a neural model of a fuzzy controller that is based on reinforcement learning for tuning the membership functions that correspond to a set of fuzzy rules. Finally, we describe general aspects of neuro-fuzzy classi ers for pattern recognition problems.

6.1 ANFIS: Adaptive Neuro-Fuzzy Inference Systems

ANFIS is a class of adaptive networks that act as a fundamental framework for adaptive fuzzy inference systems [47, 48, 49]. The term ANFIS stands for Adaptive Network-based Fuzzy Inference System, or equivalently, Adaptive Neuro-Fuzzy Inference System. 16

A (xi ) j i

wj

x1 x2

wj

wj fj

P y = Kj=1 wj fj

xN

Figure 8: The ANFIS architecture Consider a problem with N inputs (x = x ; : : : ; xN ) and an output y. Assuming the Sugeno fuzzy model, a typical rule set consists of a number of K fuzzy rules, each of which being described by the following linguistic expression: Rj : If x is Aj and : : : and xN is AjN then y = fj = pj x + : : : + pjN xN + rj (8j = 1; : : : ; K ) Mapping the operation of this fuzzy model to a neural network design we manage to obtain a general-purpose neural architecture. This network consists of ve layers (in addition to the input layer) each of them representing an appropriate fuzzy procedure. Fig. 8 illustrates the ANFIS architecture. Every node of the rst layer corresponds to an available fuzzy set (Aji ) of the inputs. Its output represents the membership value Aji (xi) of the input xi to the fuzzy set Aji . For example, if we suggest the bell function, the membership function will be expressed as: 1 Aji (xi) = (24) xi ?cji bji 1 + [( aji ) ] 1

1

1

1

1

2

where the triple faji ; bji ; cji g is the parameter set of each fuzzy set Aji . The number of nodes in the second layer is equal to the number of rules in the rule bank. Each node j multiplies the incoming signals received from the rst layer and produces a quantity wj that represents the ring strength of the corresponding rule j . The operation that takes place in this layer is the fuzzy AND and is labeled as Q:

wj =

N Y

i=1

Aji (xi)

(25)

The third layer performs a normalization of the ring strengths of the previous layer, and thus contains as many nodes as the second layer. Each node j calculates the ratio 17

of the ith rule's strength over the sum of the strengths of all rules as de ned by the next equation: wj = PKwj (26) l wl According to the formula of the fuzzy inference mechanism, the next step after computing the fuzzy normalized rule's strength is to multiply it by the function of the consequent of the rule. This is what happens at the fourth layer where the output of each node is described by the next equation: wj fi = wj (pj x + : : : + pjN xN + rj ) (27) Finally, the defuzzi cation procedure takes place at the fth layer which computes the overall output as the summation of all incoming signals from the fourth layer. This layer consists of a single node that de nes the nal output of the inference system: PK w f K X (28) y = wj fj = Pj K j j j wj j =1

1

1

=1

=1

=1

During the learning procedure of the ANFIS structure the network adjusts its parameters, i.e. those that determine the shape of the membership functions of the premises as well as the consequent parameters. According to Eq. 28 the output y can be rewritten as:

y=

K X j =1

w j fj =

K X j =1

N X

wj ( pjlxl + rj ) l=1

(29)

which is linear in the consequent labels pjl , rj , 8j; l (j = 1; : : : ; K , l = 1; : : :; N ). Therefore the learning algorithm can identify the consequent parameters by the least-squares method, while it can use the gradient descent method (propagating the error signals backward) to update the premise parameters of the membership functions. The above structure represents an adaptive neural network that operates in a way similar to the Sugeno fuzzy model. It must be noted that this architecture can be reduced to a four-layer network by appropriately combining the third and fourth layer. Apart from the Sugeno model, ANFIS architectures can be obtained using the Mamdani fuzzy model as well [49, 50], and thus it can be seen as a general-purpose learning controller.

6.2 GARIC: Generalized Approximate Reasoning-based Intelligent Control

The Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture [6] results from the combination of neural networks, fuzzy systems and reinforcement learning. It constitutes a new method of learning and adjusting the parameters of fuzzy systems using reinforcement values received from the environment. The general diagram of the GARIC architecture is given in Fig. 9. The system is composed of two networks : the Action Selection Network (ASN) and the Action Evaluation 18

weight updates

r Action Evaluation e(x) Network (AEN)

Action Selection Network (ASN)

r

F

0

Stochastic Action Modi er (SAM)

F

0

Physical System

State x Failure reinforcement signal

Figure 9: The GARIC architecture Network (AEN). The rst network maps an input state vector into an action F , while the second (AEN) provides an evaluation of the current state. There is also a Stochastic Action Modi er (SAM) which receives the recommended action F and an internal reinforcement signal r0 and produces a nal action F 0 that is actually applied to the physical system. The AEN is the adaptive critic element of the system. It is a typical two-layer feedforward neural network that receives the state of the system x and produces a prediction e(x) of future reinforcement for this state. The internal reinforcement r0 that evaluates the action recommended by the ASN, is computed based on the reinforcement signal r and the prediction e(x): r0 = r ? e(x) (30) The above internal reinforcement value plays the role of the error which is backpropagated to the network in order for the weights to be updated. Moreover, it is also used in the learning phase of the ASN evaluating the action suggested. The architecture of ASN is suitable for mapping the rules of a fuzzy expert system. As shown in Fig. 10, ve layers are needed to implement the fuzzy inference process, each of them performing one stage of this process. The nodes of the input layer correspond to the linguistic variables of interest. The rst hidden layer stores the antecedent conditions of the fuzzy rules. The number of its nodes is equal to the number of possible values of the linguistic variables. The operation that is performed is fuzzi cation, i.e., the computation of the membership function values. Triangular shapes for the computation of membership values are preferable as they are simpler and more ecient.

19

2 1 3 Inputs Antecedent Rules Labels

4 5 ConsequentAction Labels

R1 x1

R2

x2

R3

F

R4 softmin

LMOM

Figure 10: The architecture of the Action Selection Network The fuzzy rule bank is stored in the second hidden layer, thus, the number of nodes is equal to the number of fuzzy rules. The softmin operation [6] takes place at each node providing the degree of applicability of each rule. The nodes in the third hidden layer correspond to the consequent parts of the fuzzy rules. The process of defuzzi cation is performed using the LMOM (local mean-of-maximum) method [6]. Finally the output layer contains as many nodes as is the number of control variables. Each output node is connected to the nodes of the second and the third hidden layer and computes a continuous ouput value which corresponds to the action selected by the ASN network. During the learning phase the fuzzy control parameters of the network are updated in a reward/punishement fashion. Finally, the SAM uses the internal reinforcement (provided by the AEN) and the recommended action F (suggested by the ASN) to stochastically generate an action F 0 which is a Gaussian random variable with mean F . This stochastic perturbation results in better exploration of the state and increased generalization ability.

6.2.1 The extended (discrete) GARIC approach

In its previously described original formulation the GARIC architecture assumes that the outputs of the ASN network take continuous values. In many cases there is the limitation that the control action assumes values from a discrete set of possible actions. In order to deal with the requirement of discrete output space, a modi cation of the original GARIC architecture was developed [59] that mainly concerns the manner in which the defuzzi cation process is performed. The new architecture considers an action selection network (ASN) having a number of discrete output units, instead of one continuous output unit as it happens in the original GARIC formulation. At each step only one of the output units can be in the `on' state, and the corresponding action is applied to the system. Furthermore, there is no need for a stochastic action modi er since at each step the strength of each fuzzy rule contributes to the probability of selecting the corresponding action. 20

Inputs

Antecedent Rules Labels

wr j

Actions

Actions

mi

pi

Figure 11: The modi ed Action Selection Network A. The modi ed ASN

The ASN implements an inference scheme based on fuzzy control rules by providing at each step the control action that corresponds to the state of the system. It consists of ve layers, as in the original GARIC formulation, but some of them operate in a di erent way. Fig. 11 displays the proposed architecture for the ASN. The rst layer is the input layer consisting of the real-valued input vector that constitutes the state of the system. Each second layer unit corresponds to a possible linguistic value of an input variable and computes the triangular-shaped membership function value  according to the following equation: 8 jx?cj >< 1 ? sR if x 2 [c; c + sR] (x) = 1 ? jxs?Lcj if x 2 [c ? sL; c) (31) >: 0 otherwise where c, sL and sR denote the centers, left spreads and right spreads of the linguistic variables. Each node in the third layer corresponds to a rule of the fuzzy rule bank and receives the membership degrees of the values of the linguistic variables appearing in the if part of the rule. The output wr of a node r provides the strength of the corresponding rule and is computed through the softmin operation: P  e?kj wr = Pj ej ?kj (32) j 21

where the index j concerns all the nodes of the second layer that are connected to r. The nodes in the fourth layer correspond to the possible output actions with inputs coming from all the rules which suggest the particular action. The output of each node i of this layer is computed as follows: P2 wr ? : ? 1 mi = (33) ? 1+e r The above operation provides a way for computing the potential of action i based on the contributions wr of the rules suggesting that output. Obviously, the potential mi takes values in the range [?1; 1]. The last ( fth) layer will have as many units as there are in the fourth layer (equal to the number of output actions). Each node i in this layer receives the potential mi from the previous layer and, using the Boltzmann distribution, computes the normalized probability that the corresponding action is selected: (

0 5)

mi =T pi = Pe eml=T l

(34)

where T is the temperature of the system. Random selection using this probability vector provides the control action that is applied to the system. Initially, the value of the parameter T is large, having all the output actions with almost the same selection probability (when T ! 1, emi=T ! 1 independently of the action i). Thus, in the rst steps the stochastisity of the network is high investigating better every possible action. As learning proceeds, the temperature value gradually decreases, decreasing at the same time the stochasticity and biasing the probabilities towards selecting the action with the greatest potential. B. Training of the ASN

The objective of learning in the ASN is the adjustment of the network parameters which are the centroid, left spread and right spread corresponding to each linguistic value. This can be achieved through rewarding a good selected action and blaming a bad one. In this sense, ASN backpropagates the internal reinforcement r0 and the corresponding parameters are appropriately updated. This evaluation is used to update the centers and the spreads of the linguistic variables which are included in the fuzzy rules suggesting the selected output action . Considering only the selected output and traversing the network backwards, the learning equation for the parameter  (where  may be any of the c, sR, sL) can be written: 0 X @m @wr  = r0 @m = r @ r @wr @ X @wr @ = r0 @m (35) @w @ @ r

r

22

Table 1: The derivatives of the membership function with respect to its parameters @ @ @ x @c @sL @sR [c; c + sR] 1=sR 0 (x ? c)=s2R [c ? sL; c] ?1=sL (c ? x)=s2L 0 otherwise 0 0 0

with  being the learning rate of the ASN. Clearly, all the above derivatives can be computed locally at each node using the corresponding equations during the forward pass through the network. From Eq. 33, it can be found that the derative @m =@wr is computed as: @m = 1 (1 + m )(1 ? m ) (36) @wr 2 where the index r concerns every rule suggesting output . It is clear from the above equation that the derivative of m does not depend on w. Moreover, using Eq. 32, it can be shown that the partial derivative of wr with respect to j is the following: @wr = e?kj (1P+ k(wr ? j )) (37) @j i e?ki Finally, Table 1 displays the derivatives of the membership values  with respect to the centers and left and right spreads of the values of the linguistic variables used in the fuzzy rules. Using the derivatives provided from Eq. 36, 37 and Table 1, Eq. 35 can be used at each step to compute the nessecary updates of the network parameters.

6.3 Neuro-fuzzy classi ers

Traditional set theory describes crisp events that either do or do not occur. Traditional sets use probability theory to explain if an event will occur, measuring the chance with which a given event is expected to occur. In contrast, fuzzy set theory measures the degree to which an event occurs. Pattern classi cation divides the domain of the space into categories (sets), and then assigns each pattern to one of these categories. Regarding each category as a fuzzy set and identifying the assignment operation with the notion of a membership function, a direct relationship between fuzzy sets and pattern recognition becomes visible. In this sense, pattern classes are considered as fuzzy sets where a pattern belongs to each one of these classes with a certain membership value. Zadeh [125] rst stated the relation between pattern classes and fuzzy sets. Other similar work [5] had also described fuzzy pattern classi cation. Later, another paper [55] mentioned the idea of replacing the crisp desicion boundaries of the perceptron neural network by fuzzy hyperplane desicion boundaries. 23

max point

min point

111111 11111 000000 00000 000000 111111 00000 00000 00000 1111 11111 000000 11111 111111 0000011111 11111 00000 11111 0000 00000 11111 000000 111111 00000 11111 0000 1111 00000 11111 00000 11111 00000 11111 0000 1111 00000 11111 0000 1111 00000 11111 00000 11111 0000 1111 00000 11111 0000 1111 0000 1111 0000 1111 00000 11111 Class 1 00000 11111 0000 1111 000 111 0000 1111 0000 1111 0000 1111 000 111 00000 11111 0000111 1111 0000 1111 1111 000 111 000 00000 11111 0000111 1111 0000 0000 000 111 00 11 0001111

0000 1111 000 111 00 000 111 000 111 000 111 0000 1111 00011 111 111 000 00 11 000 111 000 111 000 111 000 111 Class 2 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 000 111 000 111 000 111 111 000 111 000 111 000 000 111 000 111 000111 111 000 111 Figure 12: Fuzzy hyperboxes

Several models combining fuzzy systems and neural networks have been developed that build ecient pattern classi ers exploiting the particular advantages o ered by each technique in a synergistic manner [8, 12, 24, 69, 91, 101]. Most of these methods use the training set to produce geometrical hyperboxes and then compute suitable membership functions in order to specify the desicion boundaries of pattern classes (Fig. 12). In the next sections four such structures are described that may be viewed as representative approaches. In general, every fuzzy clustering/classi cation algorithm (or at least the majority of them) may be incorporated into a neural network design scheme. The fuzzy-neural networks considered are: the fuzzy min-max, the fuzzy ART/ARTMAP and two approaches based on proximity characteristics.

6.3.1 The fuzzy min-max neural network

The fuzzy min-max neural network [71, 91, 92] is an on-line supervised learning classi er whose operation and training are based on the concept of hyperbox fuzzy sets. Consider a classi cation problem with n continuous attributes that have been rescaled in the interval [0; 1], hence the pattern space is I n ([0; 1]n). Moreover, consider that there exist p classes and B hyperboxes with corresponding minimum and maximum values vji and wji respectively (j = 1; : : :; B , i = 1; : : : ; n). Let also ck denote the class label associated with hyperbox Bk . When the hth input pattern Ah = (ah ; : : :; ahn) is presented to the network, the corresponding membership function for hyperbox Bj is ([91]): 1

n X bj (Ah) = n1 [1 ? f (ahi ? wji; ) ? f (vji ? ahi; )] i=1

(38)

where f (x; ) = x , if 0  x  1, f (x; ) = 1 if x > 1 and f (x; ) = 0 if x < 0. If the input pattern Ah falls inside the hyperbox Bj then bj (Ah) = 1, otherwise the membership decreases and the parameter  1 regulates the decrease rate. The class of the hyperbox with the maximum membership is considered as the output of the network. 24

ai

vji wji Bj

ujk

ck

Input Nodes Hyperbox Nodes Class Nodes

Figure 13: Neural network formulation of the fuzzy min-max classi er In a neural network formulation, each hyperbox Bj can be considered as a hidden unit of a feedforward neural network that receives the input pattern and computes the corresponding membership value. The values vji and wji can be considered as the weights from the input to the hidden layer. The output layer contains as many output nodes as the number of classes. The weights ujk (j = 1; : : : ; B , k = 1; : : : ; p) from the hidden to the output layer express the class corresponding to each hyperbox: ujk = 1 if Bj is a hyperbox for class ck , otherwise it is zero. Fig. 13 represents the architecture of the fuzzy min-max classi cation neural network. During learning, each training pattern Ah is presented once to the network and the following process takes place. First we nd the hyperbox Bj with the maximum membership value among those that correspond to the same class as pattern Ah and meet the expansion criterion: n X (39) n  (max(wji; ahi) ? min(vji; ahi)) i=1

The parameter  (0    1) is a user-de ned value that imposes a bound on the size of a hyperbox and its value signi cantly a ects the e ectiveness of the training algorithm. In the case where an expandable hyperbox (of the same class) cannot be found, then a new hyperbox Bk is spawned and we set wki = vki = ahi for each i. Otherwise, the hyperbox Bj with the maximum membership value is expanded in order to incorporate the new pattern Ah, i.e., for each i = 1; : : : ; n: vjinew = min(vjiold; ahi) (40) (41) wjinew = max(wjiold ; ahi) Following the expansion of a hyperbox, an overlap test takes place to determine if any overlap exists between hyperboxes from di erent classes. In case such an overlap exists, it is eliminated by a contraction process during which the size of each of the overlapping hyperboxes is minimally adjusted [91]. A basic assumption concerning the application of the fuzzy min-max classi cation network to a pattern recognition problem is that all attributes take continuous values. Hence, 25

it is possible to de ne the pattern space (union of hyperboxes) corresponding to each class by providing the minimum and maximum attribute values along each dimension. In the case of pattern recognition problems that are based on both analog and discrete attributes, it is nessecary for the discrete features to be treated in a di erent way. This results in a modi ed approach concerning training and operation of the network [71].

6.3.2 Fuzzy ART - ARTMAP

The fuzzy ART [25] is the fuzzy version of the basic adaptive reasonance theory (ART) model [23], which is a biologically motivated mechanism of perception. It is an unsupervised learning algorithm. A fuzzy ART system consists of a eld F embodying the input nodes and a eld F that contains output nodes. Let us denote by I the normalized input vector and by wj the weight vector of output node j that further represents the active code or category. For each input I and F node j , the choice function, Tj , is de ned by Tj (I ) = jI+^jwwj jj (42) j where is the choice parameter ( > 0), while the operator ^ denotes the fuzzy AND function (min). The category choice J of the system is the one with the maximum Tj value. Furthermore, the winner node J checks to nd out whether it meets the vigilance criterion or not, that is: jI ^ wJ j   (43) jI j 1

2

2

where  is the vigilance parameter ( 2 [0; 1]). In the negative case a new category J 0 is chosen (Eq. 42) while reseting the node J (TJ = 0), in order to prevent the selection of the same category, until the chosen J 0 satis es Eq. 43. During the learning procedure the weight vector wJ is updated according to the equation: wJnew = (I ^ wJold) + (1 ? )wJold (44) where is the learning parameter ( 2 [0; 1]). The above learning algorithm can be mapped on a neural architecture [25]. Extending the fuzzy ART model, the fuzzy ARTMAP was obtained which is of the supervised type [24]. The fuzzy ARTMAP is a fuzzy neural classi cation network that consists of two fuzzy ART modules (ARTa and ARTb) that are linked together. In addition there is a map eld (ARTab) which consists of interconnections between the output nodes of the two ARTs. During the training procedure fuzzy ARTMAP network is forced to learn associations between cases a of the ARTa and classes b of the ARTb.

6.3.3 Neuro-fuzzy approaches based on proximity characteristics of patterns

One of the most popular techniques in the subject of pattern recognition is the nearest neighbor rule. Many classi cation algorithms rely directly or indirectly on this rule. Its 26

popularity mostly arises from the fact that it suggests a straightforward scheme, which simply states that the unclassi ed sample is assigned to the class of its nearest neighbors among a set of design samples. A well known geometrical approach to the partitioning of the input space given a set of points is based on the construction of Voronoi diagrams (VoD) or Dirichlet tesselations. What makes the use of Voronoi diagrams attractive in pattern recognition approaches is that they produce a topological division of the pattern space based on the nearest neighbor property. Two neuro-fuzzy approaches will be described next that are based on proximity characteristics expressed in terms of Voronoi diagrams. A. The approximate VoD approach

The rst approach is based on incremental construction of convex regions used as fuzzy sets in a manner analogous to VoD construction [15]. The perpendicular bisectors (hyperplanes) or generators of the segments joining pairs of input points play an important role in construction. Speci cally, each region is characterized by a point (site) and can be expressed as the intersection of a nite number of closed half-spaces de ned by hyperplanes that separate regions of di erent classes. Regions corresponding to the same class can be overlapping. Following the principle of Dirichlet tesselations, the points of a region are closer to the site of the region than to all other sites belonging to di erent classes. Learning in the fuzzy classi cation network consists of creating and adjusting regions and associating a class label to each of them [15]. The incremental construction scheme follows proximity properties and the regions may be seen as approximate Voronoi regions. When an input pattern a = (a ; : : : ; an) is presented to the classi er during operation, a particular membership function for each region is computed. The membership function i(a) for the ith region must measure the degree to which the given pattern falls inside or outside the region. This can be considered as a measurement of how far is situated the pattern from all the hyperplanes which de ne the region. Consider the function signh (a) which describes on which side of hyperplane h lies the pattern a (signh (a) = 1). Also consider the quantities vih which take the values 1 or -1 depending on whether the site of region i is situated in the positive or negative half-space de ned by hyperplane h, respectively. The membership function taking values in [0; 1] can be computed as follows: 1 X v m (a) + 1 i(a) = 2jH (45) ih h j 2 1

i h2Hi

where Hi is the set of hyperplanes de ning region i (having cardinality jHij) and mh has the following form: 8> 1 if xh > lh and signh(a) = 1 < ?1 if xh > lh and signh(a) = ?1 (46) mh(a) = > : signh (a)xh=lh otherwise where xh (lh) represent the vertical distance of the pattern (site of region corresponding to hyperplane h) from hyperplane h. 27

mh vih i

a1 a2

uji

1 K

ad

Inputs Hyperplanes Regions

Classes

Figure 14: Neural network formulation of the approximate VoD approach The fuzzy classi er can be represented as a neural network that exploits the fuzzy set structure and allows for ecient implementation. Fig. 14 illustrates the neural network that implements this approach. It consists of four layers such that connections exist between successive layers. The rst layer represents the input layer containing a number d of nodes (pattern dimension). The number of nodes in the second layer is equal to the number of hyperplanes that de ne regions. Each second layer node computes the value of the function mh for every input pattern. The third layer contains as many nodes as the number of regions. The output of each node of this layer represents the membership value of the pattern for the corresponding region as computed in Eq. 45. The connections between nodes of the second and third layer associate regions with their supporting hyperplanes and assume the values vih de ned above. The last layer embodies nodes which correspond to the set of p classes. The connections uji between the third and fourth layer take binary values, such that uji = 1 if i is a region of class j and uji = 0 otherwise. Each node of the fourth layer computes the degree (j ) to which the input pattern ts within class j . The function that performs this computation is the fuzzy union of the appropriate region fuzzy set values. Thus, the region with the maximum membership value is selected and the class associated with the winning region is considered as the desicion of the network. B. The reduced VoD approach

The second fuzzy neural approach [16] creates fuzzy sets from the exact Voronoi diagram of the training patterns by assempling neighboring Voronoi regions whose generators belong to the same pattern class. In this way the contructed class regions specify the 28

boundaries between classes in terms of a set of hyperplanes. This formulation leads to a reduced Voronoi diagram where the new broader regions contain more than one adjoining Voronoi regions having the same class label. The resulting aggregate regions are no longer convex and may be considered as fuzzy sets by de ning membership functions indicating the degree of belongingness of points of the input space to each region. Each fuzzy set is characterized by a set of hyperplanes (separating the corresponding region from other regions) and a class label. A proper membership function of class region i can be computed as follows: 1 X mh(a) + 1 i(a) = 2jH (47) i 2 i j h2Hi

where Hi is the set of hyperplanes de ning class region i (having cardinality jHi j) and mhi has the following form : 8 >< uih(a) exp( ?jxha1 ?lhj ) if xh(a)  lh mhi(a) = uih(a) exp( ?jxha2 ?lhj ) if xh(a) > lh and uih(a) = 1 (48) >: ?1 otherwise The quantities uih(a) take the values 1 or -1 depending on whether or not the generator corresponding to hyperplane h and belonging to region i is situated in the same halfspace (de ned by h) as the pattern a. After constructing the fuzzy sets, decision probabilities are computed based on the density of membership values for each region and the respective performance in the selection of the correct region. Through discretization of the membership axis, a probabilistic function is created that establishes a correspondence between membership values in a speci c region and the probability of correct classi cation. More speci cally, considering class region i, the interval [0; 1] of membership values is divided into a number Li of equal-size cells. To each cell v (v = 1; : : : ; Li) we assign a probability value pvi computed as the percentage of the training patterns belonging to region i that have their membership value in the cell v. In order to use the method for the classi cation of a new pattern, rst the membership values i of the pattern to each region i are computed. Then the corresponding probabilities pvi are determined (where v represents the cell containing the membership value of the pattern) and the region i with maximum pvi is selected. The class of this region is considered as the nal classi cation desicion. The above desicion approach can be implemented by means of a neural network architecture consisting of ve layers as illustrated in Fig. 15. The rst three layers are similar to the previous neural structure (Fig. 14). The fourth layer implements the membership histogram. Each region i of the third layer is connected to Li nodes of the fourth layer corresponding to the cells of the histogram. Each such node v (v = 1; : : : ; Li) res only in the case where the i value falls inside the corresponding cell and provides the respective probability pvi , otherwise the output of the node is zero. The fth layer embodies one node for each of the pattern classes. If region i has class label k then the set of Li nodes of the fourth layer (representing the histogram of region i) is connected to node k of the ( )

( )

29

mhi

a1

pvi i

a2

K

ad

Inputs Hyperplanes

1

Class Histogram Regions Cells

Classes

Figure 15: Neural network formulation of the reduced VoD approach fth layer. In other words, the connections between nodes of the fourth and fth layer take binary values 1 or 0 to associate class regions (histogram cells) with class labels. The output k of each node k of the last layer is taken equal to the maximum of the outputs (probabilities pvi ) of the cell nodes connected to that node. Finally, the class k with the maximum k is the decision of the fuzzy neural classi cation network.

6.4 Applications

Applications of fuzzy neural networks to consumer products have appeared very recently. Some examples include air conditioners, electric carpets, electric fans, refrigerators, vacuum cleaners, washing machines, word processors, etc. As has been shown, the ANFIS architecture provides an important example for tuning fuzzy system parameters from input/output pairs of data. It is capable of tuning both antecedent and concequent parameters of fuzzy rules. ANFIS has been applied to a wide range of applications, among them are nonlinear function modeling [47, 49], time series prediction [49] and fuzzy controller design. Besides, it has been implemented in terms of a fuzzy rule set extracted from data points (rule extraction) [48]. The original GARIC arcitecture has been applied to the well known cart-pole balancing (inverse pendulum control) problem, where the objective is to keep a pole vertically balanced and to keep the cart within the rail track boundaries [6]. In this application the action network consisted of 13 rules. In addition, the modi ed GARIC architecture has been applied to an interesting control problem, concerning the collision-free autonomous navigation of a vehicle in various uknown grounds [59]. The experimental study of the problem has been performed through simulation using an appropriate graphical interface. The vehicle perceived its environment 30

through the use of a number of sensors, and it was able to perform one of ve possible actions To control the motion of the vehicle 10 rules were used. In comparison with other pure reinforcement neural network techniques without fuzzy rules and assuming no a priori knowledge about the task [60, 61], the GARIC approach succeeded in reaching almost perfect (collision-free) behavior in a small number of cycles. Other neuro-fuzzy techniques for control and robot applications include [26, 83, 102, 107, 112]. The fuzzy neural classi cation networks have been used in the eld of medical analysis and diagnosis. In [19] a diagnostic system was presented that employs morphometry combined with the fuzzy min-max neural network, for the discrimination of benign from malignant gastric lesions. The input to the system consisted of images of routine processed gastric smears. The analysis of the images provided a data set of cell features and the fuzzy min-max neural classi er was used to classify benign and malignant cells based on the extracted morphometric and textural features. The experimental results indicated that the use of fuzzy neural techniques along with image morphometry may o er very useful information about the potential of malignancy of gastric cells, providing a useful medical expert tool that can be very helpful to cytopathologists. Also, a similar medical application was presented in [64] conserning image texture analysis on ultrasonic images, by using a neuro-fuzzy approach based on proximity characteristic of di used liver diseases.

7 Fuzzy Genetic Algorithms According to the main framework, fuzzy logic o ers a powerful tool for knowledge representation as it has the ability to handle the amount of information necessary to describe and model a decision support system. On the other hand genetic algorithms propose a generalpurpose search mechanism adopted from a natural paradigm, that has proved robust and ecient in many applications. The idea behind the combination of fuzzy logic and genetic algorithms arises from the fact that sometimes a fuzzy expert system is very confusing and complex to be designed by an expert, as there is a large set of parameters that must be discovered. Genetic algorithms propose a general purpose optimization scheme that seems convenient for properly handling fuzzy system information and eventually managing them. The advantage of genetic-based techniques is reinforced by the possibility of obtaining ecient parallel implementations in a straightforward manner, thus ensuring fast and e ective solutions to hard problems. Moreover, during their random search genetic algorithms require only a small amount of information, the tness function, that must be optimized in an attempt to discover the appropriate fuzzy input values and to build an ecient fuzzy reasoning machine. Hybrid systems based on the synergy of fuzzy logic and genetic algorithms can be developed in two ways: by using genetic algorithms in a fuzzy logic environment, or by using fuzzy logic in genetic or evolutionary algorithm environment. The last approach deals with techniques used to improve the behaviour of genetic algorithms, such as fuzzy operators for the design of genetic operators suggesting di erent properties [7, 38, 88], or 31

other fuzzy criteria that can be used for genetic procedures [36, 68, 75]. Next we will examine only the rst approach, the application of genetic algorithms to problems of managing fuzzy information, since this has been a more interesting eld of research. Speci cally, several genetic architectures will be described concerning control and pattern recognition applications.

7.1 Fuzzy genetic approaches in control environment

The main problem in the design of a fuzzy logic controller is to establish the structure of the controller and then to set numerical values for its parameters. Genetic algorithms have been succesfully applied to solving these problems by searching for an ecient controller structure, as well as by tuning the controller parameters. As has been already mentioned, a fuzzy controller contains a number of parameters that can be used to modify the controller performance. These are:  the scaling factors for each variable,  the number and the structure of fuzzy sets,  the e ective number of fuzzy rules and their linguistic representation. Considering the set of controller parameters that must be tuned as a set of training data, we can use genetic algorithms for searching in their domain space in order to discover their suitable values. In their general formulation, genetic algorithms try to optimize the membership functions of the fuzzy rules by minimizing an appropriate error function related to the performance of the controller. For example, they may be used to modify fuzzy set de nitions, to de ne the shapes of fuzzy sets, to determine the defuzzi cation strategy used, etc. What is attractive in the use of genetic algorithms in controller design, is that they don't demand a rich knowledge of system behaviour. All they need is an ecient representation scheme for the chromosome structure as well as a capable tness function that will lead the genetic search into optimum solutions. After that, it is possible to design the controller. Genetic algorithms can be used to nd fuzzy rules [100], to nd high performance membership functions for a controller [53], or for both [42, 96]. Apart from these approaches, the genetic mechanism has also been used to improve the performance of a speci c desicionmaking system built by fuzzy logic [81]. A number of heuristic parameters concerning decision making in a fuzzy logic environment is determined, and a genetic algorithm is used to discover their values that minimize a kind of error between of the obtained desicion and a desired decision given by a teacher. An interesting novel approach for designing fuzzy logic controllers has been proposed recently [56], where a hierarchical distributed genetic algorithm implementing a multiresolution search paradigm has been developed. This architecture consists of multiple clusters, each of which contains a fuzzy logic controller and a genetic algorithm optimizer. Higher level clusters investigate wider search spaces than lower level clusters do. In this sense the 32

solutions of higher level clusters are re ned by the lower level clusters (creating searching subspaces), and thus further investigation is performed. The genetic algorithm in each cluster optimizes the parameter set of the controller and an individual of its population provides the appropriate values.

7.1.1 Representation schemes

The encoding procedure in genetic algorithms is critical to their performance. Di erent coding methods may be suggested, all aiming to describe a genetic structure that will incorporate all the necessary fuzzy information. The main advantage using genetic algorithms is that they don't need to carry out all the stages of the fuzzy design process. An ecient coding scheme will enable the genetic operations to search for the optimal fuzzy rule based representation and simultaneously to adapt the fuzzy set borders. This will cause the genetic mechanism to succeed in properly designing a simulation of the system and in achieving the best performance behaviour. Below we discuss some of these genetic representation schemes. The simplest representation of a fuzzy system provides all the fuzzy sets xed in the input space (designed in advance), and genetic algorithms are used to learn the associations between input sets and outputs [53]. The chromosome is described by a string of integer values, where each position will correspond to a particular output fuzzy set and its value in the associated input. For example, a 7 in the third posistion associates the 7th output set with the 3rd input set. This kind of scheme su ers from the enormous size of chromosome as the input space grows. Besides, it is not very informative since the set positions and their shapes must be decided by a human expert. A fuzzy rule is represented as a vector of real values (indicating the membership values of each fuzzy rule set). In the case where the fuzzy sets are designed in advance, genetic algorithms can be used to search for the associations between input sets and outputs. Considering that each chromosome position corresponds to an input set, a possible chromosome structure can be formed as a string of integers, each of which describes the corresponding output set that is related to the input set (chromosome position) [53]. A similar but more complicated method is to use a new special label `-' upon some slots of the chromosome string [100]. This label will indicate that the position with this value has not a fuzzy set entry or in other words it is not needed in the fuzzy rule. With this representation the GA determines the number of necessary rules, since the rules having the value "-" in the action condition can be ignored. Another encoding scheme that has proved to be more convenient is to represent the chromosome simply as a chain of a xed number of fuzzy rules. The rules can be encoded as a vector of real values or binary digits to approximate them. An example of this scheme is illustrated in Fig. 16, where a rule base has ve fuzzy rules each of them described by the input value (center of the set along two universes of discourse, c , c , together with a shape parameter p) and the output value o. A fuzzy rule between the state variables x ; : : : ; xn and the control variables y ; : : :; yn is represented by the the following linguistic form: 1

1

33

2

1

RULE 1 RULE 2 RULE 3 RULE 4 RULE 5

c1 c2 p o c1 c2 p o c1 c2 p o c1 c2 p o c1 c2 p o

Figure 16: Coding scheme

Ri: IF x is Ai : : : AND xn is Ain THEN y is Bi : : : AND ym is Bim Considering that every fuzzy set follows the normalized trapezoidal membership function, the 4-tuples of the real values (aij ; aij ; aij ; aij ), (bik ; bik ; bik ; bik ), j = 1; : : : ; n; k = 1; : : :; m, de ne the membership functions of the fuzzy sets Aij ; Bik . Thus a chromosome structure can be represented as a vector of oating point values that can be optimized by a real coded genetic algorithm by using appropriate genetic operators [37]. Other membership functions, (triangular), can be also considered and genetic algorithms may be used to optimize their parameters (the 3-tuples (aij ; aij ; aij ), (bik ; bik ; bik )) in the same manner. 1

1

1

1

2

3

4

1

1

2

3

1

2

3

1

4

2

3

7.2 Fuzzy genetic approaches for pattern recognition

A broad spectrum of clustering/classi cation algorithms attempt to generate a partition of the sample data through the minimization of an objective function based on a clustering criterion. The kinds of partitions generated and the geometric structure of the clusters are closely related to the distance measure chosen and the objective function being optimized. The partitions are either hard, that is each sample point is unequivocally assigned to a cluster and is considered to bear no similarity to members of other clusters, or fuzzy, in which case a membership function expresses the degree of similarity between the sample and each cluster [9, 12]. In the case of clustering approaches the most common application of genetic algorithms in fuzzy environments concerns the fuzzy c-means algorithm. The fuzzy c-means (FCM) clustering approach [9] belongs to the general class of c-means partitioning models and has been extensively used in various types of pattern and image processing/analysis applications. Consider a number of clusters c and a given set of unlabeled data points X = x ; : : :; xn. The clustering criterion used by the FCM algotithm is associated with the generalized least-squared errors functional 1

Jm(U; V ) =

n Xc X

i=1 j =1

(uij )mDij

(49)

where m > 1 is a weighting exponent (degree of fuzzi cation), V = [v ; : : :; vc] (vi 2 Rp) is the vector of geometric centers (cluster prototypes), and Dij is some similarity (distance) metric between xj and vi. The value uij represent the membership of j -th data point to the i-th cluster expressed as: c D X ij = m? ? uij = ( ( D ) ) 1  i  c; 1  j  n (50) kj k 1

1 (

1)

1

=1

34

Genetic algorithms seem to o er a promising alternative in the direction of optimizing the FCM functional Jm. Representation and exploration of the problem parameter space can be based on encoding and evolving either both U and V , or only one of the variables U and V . In the latter case, one variable is eliminated and optimization of Jm is performed over the other. In various related approaches binary coding has been generally employed for representing data partitioning [10, 11, 22, 65]. Apart from the traditional binary coding, which sometimes may have serious drawbacks when applied to multidimensional problems of high numerical precision, an alternative representation is based on real-coding genetic algorithms [76]. In this direction genes are represented directly as oating-point numbers (problem variables) and chromosomes (strings of genes) as vectors of real numbers, thus enabling the exploration of large domains without sacri cing precision or memory [18]. The last approach has shown very good performance in terms of the rate of correct classi cations during testing. In fuzzy classi cation applications, real-coded genetic optimization schemes may also o er an alternative solution with good results [17]. In this approach, X = x ; : : :; xn is a set of labeled data and the aim is to partition the input space into a number of clusters, the number and class label of which are assumed to be known in advance. The optimization criterion is associated with a generalized least-squared errors functional that includes constraints about the clusters and also their class labels. Extending the clustering criterion described in Eq. 49 we obtain the following formula: 1

Hm =

n c X X i=1 j =1

ij Dij mij

(51)

The new quantity ij describes the correctness of the classi cation and is equal to unity if the pattern xj and the centre vi are of the same class, otherwise it is taken equal to some positive constant much larger than unity. Finally, another approach to pattern classi cation that has been proposed is based on fuzzy IF ? THEN rules [46] This method generates initial fuzzy classi cation rules for a speci c set of training patterns and then formulates the rule set as a combinatorial optimization problem. The objective of the method is on the one hand to maximize the correct classi cation rate, and on the other hand to minimize the number of fuzzy rules. A genetic algorithm was used for this problem with a representation scheme of rule-set strings, indicating for each rule whether it belongs to the rule set or not, or it is a dummy rule.

7.3 Fuzzy Learning Classi er Systems

A learning classi er system (LCS) is a massively parallel, message-passing, rule-based system that is capable of environmental interaction and reinforcement learning through credit assignment and rule discovery. Such a production system consists of a set of rules representing a population which evolves through a GA-based learning process. The foundations of LCS were laid by Holland [41]. Representing the classi er list of a LCS as a set of 35

fuzzy if-then rules we obtain a type of genetic based machine learning system called fuzzy learning classi er system (FLCS). The FLCS learns by creating fuzzy rules related to input variables. It forms the same tasks as the LCS working in a fuzzy environment. The rst FLCS approach was developed by the Valenzuela-Rendon [113]. In this system each classi er corresponds to a binary string which encodes the membership function of the fuzzy sets that describe the problem variables, where the number of bits is equal to the total number of fuzzy sets. When a chromosome slot is 'on' (value 1), this means that the corresponding fuzzy set participates in the fuzzy rule. The genetic procedure creates new classi ers allowing the evolution of the fuzzy rule bank. An alternative FLCS approach was proposed by Parodi and Bonelli [80], where each classi er contains the actual description of the membership functions that correspond to each input and output variable. Thus, the classi er list represents real-coded strings of the parameters associated with fuzzy sets, allowing learning of membership functions.

7.4 Applications

We have already seen, under the previous analysis, in which way genetic algorithms operate within a fuzzy logic environment. The main task concerns the syntactical representation scheme that must be discovered in order to allow the ecient and generative creation of individuals. Genetic fuzzy algorithms have been applied to solving many control problems. For example, in [42, 100] the problem of moving a cart of mass from a given initial position and velocity to zero position and velocity in minimum time is examined. Another interesting problem that has been tested is the truck backing problem consisting of a truck located on a grid [42]. The objective is to navigate the truck from an initial position to the location of the "loading dock". In [89] a fuzzy controller optimized by a genetic algorithm is presented that is appropriate for spacecraft attitude control. Other control applications that have been examined are the well known inverted-pendulum problem [53, 56], collisionfree movement of a robot in a simulated corridor environment [70], robust motion control of a mobile robot [82], etc. Another study of the use of genetic algorithms in the design and implementation of fuzzy logic controllers has been considered in [54], where a GA is proposed to generate membership functions for a pH control process that is present in a number of mineral and chemical industries. Genetic algorithms have also been used to optimize the Sugeno fuzzy model [93, 98] that was described previously. For this purpose the chromosome structure must represent the parameters of the input membership functions as well as the constants of the output function (consequent parameters), for each fuzzy rule considered. Unfortunately, in this scheme the created chromosomes are of huge length, since a problem of m-input-one-output system with n fuzzy sets for each input requires 3(mn + nm ) parameters (considering triangular membership functions) [96]. Finally, fuzzy logic classi er systems have been applied in several control problems [30, 103, 111, 104, 110, 115]. The method concerned a type of Velenzuela's FLCS that has 36

trained to steer a simulated ship, showing its e ectiveness in ful lling a control task and acquiring complex control rules. In the case of pattern recognition, genetic algorithms have been applied in a variety of pattern classi cation problems and have shown very good performance in terms of the rate of correct classi cations during testing [17, 18]. Comparison with other established classi cation approaches has shown that appropriate formulation of the clustering criterion combined with the genetic optimization schemes provides a promising alternative.

8 Genetic Neural Networks Neural networks were born in an attempt to imitate the human nervous system. The fundamendal tasks of neural network development concern their design and training. The problem of designing arti cial neural networks for speci c applications is an open research topic. In many applications the use of trial-and-error empirical techniques is a common secret. Furthermore, the performance of neural networks is critically dependent on the learning algorithm, the network architecture and the choice of parameters. Even when a suitable setting of parameters (weights) can be found, the ability of the resulting network to generalize the data not seen during learning may be far from optimal [34]. A motivation to this direction has recently been provided through the advances in constructive or generative neural network learning algorithms that directly determine the mapping of input-output spaces and incrementally construct the neural architecture. The problem of nding the appropriate structure of a network may be viewed as an optimization process concerning the learning algorithm as well as the architecture of the network. For these reasons it seems logical and attractive to apply genetic algorithms. Genetic algorithms may provide a useful tool for automating the design of neural networks. Various schemes for combining neural networks and genetic algorithms have been proposed. (For an extended analysis in this subject see [4, 21, 123]). In most cases, genetic algorithms have been used to either train a network or to nd a suitable topology. The study of the synergy of neural networks and genetic algorithms can be summarized to three general aspects: genetically training neural networks, genetic optimization of network topology and control parameter optimization. In these approaches, the rst step is to select an encoding scheme in order to create the genotype (usually binary representation string). After that, we decode to obtain the phenotype that gives the tness value of the chromosome.

8.1 Genetic algorithms for training neural networks

As already seen, the backpropation (BP) algorithm is a gradient descent algorithm, which tries to minimize an error function, the total mean square between network output and target output. The above error is used to adjust the synaptic weights of the network. BP has some drawbacks due to its inability to escape from local minimum and nd the global minimum [94]. 37

Considering the training procedure as an optimization process we may apply genetic algorithms to the network, where the objective is the evolutionary search for the appropriate set of weights that minimize the training error. GAs, working with a population of di erent solution points, have the ability to overcome the problem of local minima by searching simultaneously many regions of the domain space. Another advantage is that GAs are robust on the network architecture since they don't propagate the error signal backward. Given a neural network topology we may use a genetic algorithm to nd the optimum connection weights by searching the weight space. Thus, the training process may be seen as an evolution of the weights towards an optimal set and the training task as the environment in which the evolution occurs. In the general formulation of this approach, the genetic mechanism is applied to a population of individuals that represent all the synaptic weights of the neural network. The strings may be binary or real valued. Representation schemes in the former case (binary) consider that each connection weight is described by a xed number of binary digits. A simple method is the straightforward representation of all weights [45, 118]. Nevertheless, the order of the weights on the chromosome plays an important role and, thus, there are some variations where the input weights of each node are placed together [124], or the input and the output weights of a node are next to each other [99]. The use of binary coding has the disavantage of numerical precision, as well as the sometimes big structures of strings. For this reason real coding representation may be used alternatively[29]. After the representation scheme has been decided, the genetic algorithms can evaluate each chromosome of the population. Usually the most commonly used tness function is derived by means of the squared error of the training patterns. (Actually, the tness is obtained via appropriate normalization of the mean squared error yielding a form suitable for maximization.) Another way of formulating the tness function is based on the classi cation rate, i.e. the percentage of correct classi cation responses of the neural network. In most cases of applying a genetic algorithm for training a neural network attention must be paid to the e ect of genetic operators. From one point of view the weights corresponding to a node (incoming weights) should not be disrupted by crossover but should be seen as a separated entity upon the string. In this way, we may regard the chromosome as a string of genes, where each gene describes the collection of a node's incoming weights, and crossover will interchange these collections [79, 99]. In general, the way of applying genetic operators is dependent on the representation scheme adopted.

8.2 Genetic algorithms for optimizing neural network topology

In the previous section we saw how genetic algorithms can be used for training a neural network of a given topology. The opposite problem is also very interesting, that is to perform a genetic search to nd the best topology for a self-trainable neural network. The puzzle of network topology is very important because of its great degree of in uence on network performance capability. What is the best neural structure for a problem is questionable in most cases and there is no general law for nding it. It depends on the characteristics and the size of data. If the topology is very small the network may not 38

1

c14 = 1

2

c25 = 0

3

1 2 3 4 5 6

1 0 0 0 0 0 0

2 0 0 0 0 0 0

3 0 0 0 0 0 0

45 10 10 01 00 0 0 0 0

6 0 0 1 1 1 0

4 5

c46 = 1

6

chromosome 000100 000100 000011 000001 000001 000000

Figure 17: Direct encoding scheme be able to eciently learn the training pairs of patterns, while a large topology restricts the generalization capability of the network. A safe method for obtaining the appropriate topology is based on trial-and-error. In this sense, genetic algorithms may provide an attractive approach for solving the problem of searching the optimum topology. As in the previous case of applying genetic search in a neural network design, the representation formulation provides the key element for the success of the genetic mechanism. There are two general encoding schemes for designing the architecture of a neural network: direct or low-level encoding and indirect or high-level encoding. According to direct encoding [20, 77, 90] all the available information about the architecture is directly represented in the binary string. In most cases the chromosomes represent the connections between all layers. A N  N matrix C = (cij )N N can describe the presence of connections. If cij = 1 then there is a link between nodes i and j , while the value cij = 0 denotes the absence of the corresponding connection. Fig. 17 illustrates an example of this encoding. The extracted chromosome in this kind of encoding indicates in a row-by-row form (as depicted by matrix C ) the connectivity structure of the network. As it is obvious, this scheme is suitable only for small neural architectures since a large one would require a population of very large chromosomes that would reduce the speed of genetic search. Nevertheless, if we consider a fully connected neural structure, genetic algorithms can be used to optimize the number of hidden layers as well as the number of nodes in each layer [14, 84]. The indirect encoding scheme suggests only the most important topological features, such as the number of nodes and the number of connections [33, 78]. This approach is more attractive and advantageous since it requires a more `clever' knowledge and, as a result, less search space for the neural architecture, and not the precise (and enormous) topological knowledgem, in the direct scheme. An interesting indirect encoding scheme that has been 39

proposed is based on a context-free deterministic graph grammar design with which the neural topology can be generated [57]. As can be observed, the strings created by both encoding schemes are binary. The structure of chromosomes represents a neural architecture that is subsequently trained (usually by the backpropagation learning algorithm) in order to be evaluated. The most common form of tness function is again the mean squared error. Finally, the chromosomes are recombined and evolved by using appropriate genetic operators (crossover and mutation).

8.3 Other genetic neural approaches

In the previous subsections we discussed two general schemes combining genetic algorithms and neural networks. The rst one considers genetic algorithms as a weights training algorithm, while the second approach determines the architecture of the network for ecient training. In addition genetic algorithms may optimize the learning parameters of the selected training algorithm, as for example the learning rate and the momentum parameter of the backpropagation algorithm, that normally must be speci ed by the user. This can be done by incorporating these parameters in the chromosome structure, but sometimes with a special care in order to avoid any relation with the rest of the string which is of di erent interpretation. By combining these two general approaches, genetic algorithms can be also used for determining the synaptic weights and the topology simultaneously. [20, 27, 87]. In such cases, due to the obviously large domain space that must be investigated, the weights of the network are restricted to the binary values 1. Accordinlgy, the encoding scheme can be also applied in a direct or indirect fashion.

8.4 Applications

In all the above approaches we considered a feedforward type of neural network trained with the backpropagation learning algorithm. Genetic algorithms have also been used for con guring radial basis function (RBF) neural networks [13, 63, 117]. Speci cally, they have been applied to nd the optimal (Gaussian) parameters used (centers, widths), as well as the structure (number of hidden layer nodes) of the RBF network. Genetic algorithms have also been applied to cellular neural networks to optimize the synaptic connections that in uence the dynamic trajectory of a neuron's activation level [126]. In this case, a real-coded genetic algorithm was used involving a variety of genetic operators for greyscale image procesing tasks. Another neural model that has been investigated through the use of genetic algorithms is the self-organizing map (SOM), with respect to the problem of determining its topology [32].

40

9 Conclusions We have presented a general survey of hybrid systems related to three important elds of soft computing: neural networks, fuzzy logic and genetic algorithms. We have focused on the analysis of developed methodologies together with their applications, in an attempt to classify them. First we have described neural-fuzzy (NF) hybrid architectures and how they can be applied in a synergistic manner. These techniques draw bene ts from the adaptive construction of fuzzy inference systems by learning their free parameters. Three general architectures were presented that illustrate important aspects of this subject: the ANFIS architecture that is capable of implementing fuzzy logic controllers, the GARIC architecture that proposes a general scheme for adjusting fuzzy parameters using reinforcement learning, and nally representative architectures of neurofuzzy classi ers (NFCL) suitable for pattern recognition applications. In the case of fuzzy genetic (FG) hybrid schemes, genetic algorithms are used to manage information of fuzzy systems. The methodologies presented were divided into three bunches: fuzzy genetic approaches for control (FGC) and for pattern recognition (FGPR) applications, as well as fuzzy learning classi er systems (FLCS). Finally, the neural genetic (NG) approaches considered were techniques for neural network training (NGTR) and for optimizing the network topology (NGTOP). The variety of applications and the rapidly growing literature on these systems prove the enormous scienti c interest in hybrid soft computing approaches. Following the general schema of hybrid systems illustrated in Fig. 7, a more informative diagram may be presented in the form of a multilayer structure (Fig. 18), which shows all the above general classes of soft computing hybrid systems. Several aspects remain open subjects since there is still a kind of fuzziness about what is more ecient to do. For example, NF and FG approaches are both used to learn fuzzy logic parameters, but which one is the best has not yet been examined. Another confusing point may arise from the twofold role of genetic algorithms in a neural environment. Genetic algorithms can serve either as the trainer or as the topology optimizer of a neural network. It is reasonable to raise the question about which use is of superior bene t. To answer this question, we may say that the optimization of neural network topology seems to be more attractive and reliable since there is no natural method providing the best architecture of a neural network. What is probably the crisp answer that defuzzi es all answers is that it depends on the problem. In cases where there is neither external knowledge available nor is the problem very complex, it is preferable to use genetic hybrid systems that require only a short amount of information and can nd even the appropriate number of fuzzy rules or both weights and topology in a neural network. In addition genetic approaches can be used for solving the confusing problem of setting parameter values. On the other hand, when there is enough expert knowledge (in the sense of applying a set of imprecise fuzzy rules) neural-fuzzy hybrid approaches are more reliable since they may adjust fuzzy parameters through a learning process. What may also be interesting is 41

ANFIS GARIC NFCL

NN

FGC FS

NF

FG

FGPR FLCS

GA

NG

NGTR NGTOP ANFIS : Adaptive Neuro-Fuzzy Inference System GARIC: Generalized Approximate Reasoning-based Intelligent Control NN: Neural Networks NF: Neuro-Fuzzy NFCL : NeuroFuzzy CLassi er FS: Fuzzy Systems FG: Fuzzy-Genetic : Fuzzy Genetic for Control GA: Genetic Algorithms FGC NG: Neuro-Genetic FGPR : Fuzzy Genetic for Pattern Recognition FLCS : Fuzzy Learning Classi er System NGTR : Neural Genetic for TRaining NGTOP: Neural Genetic for TOPology

Figure 18: Hybrid approaches in a multilayer structure

42

the use of genetic algorithms as an initialization process playing an auxiliary role in nding initial values of weights for neural networks or FN approaches. We close by remarking that soft computing techniques can cooperate for improved or enhanced results with conventional Knowledge - Based (expert) systems (see e.g. [105]).

References [1] D.H. Ackley. Connectionist Machine for Genetic Hillclimbing. Boston: Kluwer Academic Publishers, 1987. [2] D.H. Ackley, G.E. Hinton, and T.J. Sejnowski. A Learning Algorithm for Boltzmann Machines. Cognitive Science, 9:147{169, 1985. [3] T. Back and H.P. Schwefel. An Overview of Evolutionary Algorithms for Parameter Optimization. Evolutionary Computation, vol. 1. Cambridge, MA: MIT Press, 1993. [4] K. Balakrishnan and V. Honovar. Evolutionary Design of Neural Architectures A Preliminary Taxonomy and Guide to Literature. Technical Report: ISU CS-TR 95-01, 1995. [5] R. Bellman, R. Kalaba, and L.A. Zadeh. Abstraction and Pattern Classi cation. J. Math. Anal. Appl., 13:1{7, 1966. [6] H.R. Berenji and P. Khedkar. Learning and Tuning Fuzzy Logic Controllers Through Reinforcements. IEEE Trans. on Neural Networks, 3(5):724{740, 1992. [7] A. Bergman, W. Burgard, and A. Hemker. Adjusting Parameters of Genetic Algorithms by Fuzzy Control Rules. In K. H. Becks and D. P. Gallix, editors, New Computer Techniques in Physics Research III. Singapore: World Scienti c Press, 1994. [8] J. Bezdek, editor. Special Issue on Fuzzy Logic and Neural networks. IEEE Trans. on Neural Networks. Vol. 3, 1992. [9] J.C. Bezdek. FCM: The Fuzzy c-Means Clustering Algorithm. Computers and Geosciences, 10:191{203, 1984. [10] J.C. Bezdek, S. Boggavarapu, L.O. Hall, and A. Bensaid. Genetic Algorithm Guided Clustering. Proc. First IEEE Conf. on Evolutionary Computation, Vol. I, Orlando, Florida, 1994. [11] J.C. Bezdek and R.J. Hathaway. Optimization of Fuzzy Clustering Criteria Using Genetic Algorithms. Proc. First IEEE Conf. on Evolutionary Computation, Vol. II, Orlando, Florida, 1994. 43

[12] J.C. Bezdek and S.K. Pal. Fuzzy Models for Pattern Recognition. New York: IEEE Press, 1992. [13] S.A. Billings and G.L. Zheng. Radial Basis Function Network Con guration Using Genetic Algorithms. Neural Networks, 8(6):877{890, 1995. [14] K. Blekas. Optimization Using Genetic Algorithms. Diploma thesis, Department of Electrical and Computer Engineering, National Technical University of Athens, Greece, October 1993. [15] K. Blekas, A. Likas, and A. Stafylopatis. A Fuzzy Neural Network Approach Based on Dirichlet Tesselations for Nearest Neighbor Classi cation of Patterns. Proc. of IEEE Workshop on Neural Networks for Signal Processing (NNSP'95), pp. 153{161, Boston, USA, 1995. [16] K. Blekas, A. Likas, and A. Stafylopatis. A Fuzzy Neural Network Approach to Pattern Classi cation Based on Proximity Characteristics. Proc. of 9th IEEE Inter. Conf. on Tools with Arti cial Intelligence (ICTAI'97), Californa, USA, 1997. [17] K. Blekas, G. Papageorgiou, and A. Stafylopatis. Continuous Optimization Schemes for Fuzzy Classi cation. Int. Conference on Digital Signal Processing (DSP'97), pp. 265{268, Santorini, Greece, 1997. [18] K. Blekas and A. Stafylopatis. Real-coded Genetic Optimization of Fuzzy Clustering. Proc. European Congress on Fuzzy and Intelligent Technology (EUFIT'96), Vol. I, pp. 461{465, Aachen, Germany, 1996. [19] K. Blekas, A. Stafylopatis, D. Kontoravdis, A. Likas, and P. Karakitsos. Cytological Diagnosis Based on Fuzzy Neural Networks. Journal of Intelligent Systems, 8(1/2), 1998. [20] S. Bornholdt and D. Graudenz. General Asymetric Neural Netowrks and Structure Design by Genetic Algorithms. Neural Networks, 5:327{334, 1992. [21] J. Branke. Evolutionary Algorithms for Neural Network Design and Topology. Proc. of the 1st Nordic Workshop on Genetic Algorithms and its Applications, Vassa, Finland, 1995. [22] B.P. Buckles, F.E. Petry, D. Prabhu, R. George, and R. Srikanth. Fuzzy Clustering with Genetic Search. Proc. First IEEE Conf. on Evolutionary Computation, Vol. I, Orlando, Florida, 1994. [23] G.A. Carpenter and S. Grossberg. A Massively Parallel Architecture for a SelfOrganizing Neural Pattern Recognition Machine. Computer Vision, Graphics, and Image Processing, 37:54{115, 1987. 44

[24] G.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, and D.B. Rosen. Fuzzy ARTMAP: A Neural Network Architecture for Incremental Supervised Learning of Analog Multidimensional Maps. IEEE Trans. on Neural Networks, 3(5):698{713, 1992. [25] G.A. Carpenter, S. Grossberg, and D.B. Rosen. Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adative Reasonance System. Neural Networks, 4:759{771, 1991. [26] P. Dalianis, Y. Kitsios, and S.G. Tzafestas. Graph Coloring Using Fuzzy Controlled Neural Networks. Intell. Automation and Soft Computing, 4(4):273{288, 1996. [27] D. Dasgupta and D.R. McGregor. Designing Application-speci c Neural Networks Using the Structured Genetic Algorithms. Proc. of the International Workshop on Combinations of Genetic Algorithms and Neural Networks, pp. 87{96, 1992. [28] L. Davis. Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold, 1991. [29] D.B. Fogel, L.J. Fogel, and V.W. Porto. Evolving Neural Networks. Biological Cybernetics, 63:487{493, 1990. [30] T. Furuhashi, K. Nakaoka, K. Morikawa, and Y. Uchikawa. Controlling Execessive Fuzziness in a Fuzzy Classi er System. Proc. of the Fifth International Conference on Genetic Algorithms, 1993. [31] D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, Mass.: Addison-Wesley Publishing Co., 1989. [32] A. Hamalainen. Using Genetic Algorithms in Self-Organizing Map Design. Proc. of the International Conference on Arti cial Neural Networks and Genetic Algorithms, 1995. [33] S.A. Harp, T. Samad, and A. Guha. Towards the Genetic Synthesis of Neural Networks. In D. Touretzky, editor, Advances in Neural Information Processing Systems II, pp. 447{454. Morgan Kaufmann, San Mateo, CA, 1989. [34] S. Haykin. Neural Networks. New York: Macmillan, 1994. [35] D.O. Hebb. The Organization of Behaviour. New York: Wiley, 1949. [36] F. Herrera, E. Herrera, M. Lozano, and J.L. Verdegay. Fuzzy Tools to Improve Genetic Algorithms. Proc. European Congress on Fuzzy and Intelligent Technology (EUFIT'94), pp. 1532{1539, Aachen, Germany, 1994. [37] F. Herrera, M. Lozano, and J.L. Verdegay. Tuning Fuzzy Logic Controllers by Genetic Algorithms. International Journal of Approximate Reasoning, 12(3):293{315, 1995. 45

[38] F. Herrera, M. Lozano, and J.L. Verdegay. Fuzzy Connective Based Crossover Operators to Model Genetic Algorithms Population Diversity. Fuzzy Sets & Systems, 92-1:21{30, 1997. [39] G.E. Hinton and T.J. Sejnowski. Learning and Relearning in Boltzmann Machines. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing: Explorations in Microstructure of Cognition. Cambridge, MA: MIT Press, 1986. [40] J.H. Holland. Adaptation in Natural and Arti cial Systems. Ann Arbor: University of Michigan Press, 1975. [41] J.H. Holland, K.J. Holyoak, R.E. Nisbett, and P.R. Thagard. Induction: Processes of Inference, Learning, and Discovery. Cambridge, MA: MIT Press, 1986. [42] A. Homaifar and Ed McCormick. Simultaneous Design of Membership Functions and Rule Sets for Fuzzy Controllers Using Genetic Algorithms. IEEE Trans. on Fuzzy Systems, 3(2):129{139, 1995. [43] J.J. Hop eld. Neural Networks and Physical Systems with Emergent Collective Computational Abilities. Proc. of the National Academy of Sciences of the U.S.A., vol. 79, pp. 2554{2558, 1982. [44] J.J. Hop eld and D.W. Tank. Neural Computation of Decisions in Optimization Problems. Biological Cybernetics, 52:141{152, 1985. [45] T. Ichimura, T. Takano, and E.Tazaki. Learning of Neural Networks Using Hybrid Genetic Algorithm. Proc. European Congress on Fuzzy and Intelligent Technology (EUFIT'96), Vol. I, pp. 526{530, Aachen, Germany, 1996. [46] H. Ishibuchi, K. Nozaki, N. Yamamoto, and H. Tanaka. Selecting Fuzzy If-Then Rules for Classi cation Problems Using Genetic Algorithms. IEEE Trans. on Fuzzy Systems, 3(3):260{270, 1995. [47] J.S.R. Jang. Fuzzy Modeling Using Generalized Neural Networks and Kalman Filter Algorithm. Proc. 9th Nat. Conf. on Arti cial Intelligence (AAAI-91), pp. 762{767, 1991. [48] J.S.R. Jang. Rule Extraction Using Generalized Neural Networks. Proc. 4th IFSA World Congress, pp. 82{86, 1991. [49] J.S.R. Jang. ANFIS: Adaptive-Network-based Fuzzy Inference Systems. IEEE Trans. Systems, Man, and Cybernetics, 3:665{685, 1993. [50] J.S.R. Jang and C.-T. Sun. Neuro-Fuzzy Modeling and Control. Proceedings of the IEEE, 83(3):450{465, 1995. 46

[51] T. Jang and Y. Li. Generalized De uzi cation Strategies and Their Parameter Learning Procedures. IEEE Trans. on Fuzzy Systems, 4(1):64{71, 1996. [52] A. Kandel. Fuzzy Expert Systems. Boca Raton, FL: CRC Press, 1992. [53] C.L. Karr. Design of a Cart-Pole Balancing Fuzzy Logic Controller using Genetic Algorithm. SPIE Conf. on Applications of Arti cial Intelligence, Bellingham, WA, 1991. [54] C.L. Karr and E.J. Gentry. Fuzzy Control of pH Using Genetic Algorithms. IEEE Trans. on Fuzzy Systems, 1(1):46{53, 1993. [55] J. Keller and D. Hunt. Incorporating Fuzzy Membership Functions into the Perceptron Algorithm. IEEE Trans. on Pattern Analysis and Machine Intelligence, 7:693{699, 1985. [56] J. Kim. Designing Fuzzy Logic Controllers Using a Multiresolutional Search Approach. IEEE Trans. on Fuzzy Systems, 4(3):213{226, 1996. [57] H. Kitano. Designing Neural Networks Using Genetic Algorithms with Graph Generation System. Complex Systems, 4:461{476, 1990. [58] T. Kohonen. The Self-Organization Map. Proceedings of the IEEE, 78:1464{1480, 1990. [59] D. Kontoravdis, A. Likas, K. Blekas, and A. Stafylopatis. A Fuzzy Neural Approach to Autonomous Vehicle Navigation. Proc. EURISCON '94, Malaga, Spain, 1994. [60] D. Kontoravdis, A. Likas, and A. Stafylopatis. Collision-Free Movement of an Autonomous Vehicle Using Reinforcement Learning. Proc. European Conference on Arti cial Intelligence (ECAI 92), pp. 666{670, Vienna, 1992. [61] D. Kontoravdis and A. Stafylopatis. Reinforcement Learning Techniques for Autonomous Vehicle Control. Neural Network World, 3-4:329{346, 1992. [62] J.R. Koza. Genetic Programming: On the Programming of Computer by Means of Natural Selection. Cambridge, MA: MIT Press, 1992. [63] L.E. Kuo and S.S. Melsheimer. Using Genetic Algorithms to estimate the otpimum width parameter in Radial Basis Function Networks. Proc. of the 1994 American Control Conference, Bultimore, MD, 1994. [64] E. Kyriacou, S. Pavlopoulos, D. Koutsouris, K. Blekas, A. Stafylopatis, and P. Zoumpoulis. Fuzzy Neural Network Based Characterization of Di used Liver Diseases using Image Texture Analysis Techniques on Ultrasonic Images. submitted for publication to: IEEE Transactions on Information Technology in Biomedicine. 47

[65] T. Van Le. Evolutionary Fuzzy Clustering. Proc. IEEE Int. Conf. on Evolutionary Computation, Vol. 2, Perth, Western Australia, 1995. [66] C.C. Lee. Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part 1. IEEE Trans. Systems, Man, and Cybernetics, 20:404{418, 1990. [67] C.C. Lee. Fuzzy Logic in Control Systems: Fuzzy Logic Controller-Part 2. IEEE Trans. Systems, Man, and Cybernetics, 20:419{435, 1990. [68] M.A. Lee and H. Takagi. Dynamic Control of Genetic Algorithms Using Fuzzy Logic Techniques. Proc. Fifth International Conference on Genetic Algorithms (ICGA'93), pp. 76{83, San Mateo, 1993. [69] S. Lee and E. Lee. Fuzzy Neural Networks. Math. Biosciences, 23:151{177, 1975. [70] D. Leitch and P. Probert. Genetic Algorithms for the Development of Fuzzy Controllers for Autonomous Guided Vehicles. Proc. 2nd European Conf. on Intelligent Techniques and Soft Computing, Aachen, Germany, 1994. [71] A. Likas, K. Blekas, and A. Stafylopatis. Application of the Fuzzy Min-Max Neural Network Classi er to Problems with Continuous and Discrete Attributes. Proc. of IEEE Workshop on Neural Networks for Signal Processing (NNSP '94), pp. 163{170, Ermioni, Greece, 1994. [72] E.H. Mamdami. Applications of Fuzzy Logic to Approximate Reasoning using Linguistic Synthesis. IEEE Trans. Computers, 26(12), 1977. [73] E.H. Mamdami and S. Assilian. An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller. Int. J. Man-Machine Studies, 7(1):431{441, 1975. [74] W.S. McCulloch and W. Pitts. A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical Biophysics, 5:115{133, 1943. [75] L. Meyer and X. Feng. A Fuzzy Stop Criterion for Genetic Algorithms Using Performance Estimation. Proc. Third IEEE International Conference on Fuzzy Systems, pp. 1990{1995, Orlando, 1993. [76] Z. Michalewicz. Genetic Algorithms + Data Structures = Evolutionary Programs. Berlin: Springer-Verlag, 1994. [77] G.F. Miller, P.M. Todd, and S.U. Hedge. Designing Neural Networks Using Genetic Algorithms. Proc. of the 3rd International Conference on Genetic Algorithms, pp. 379{384, Arlington, 1989. [78] E. Mjolsness, D.H. Sharp, and B.K. Alpert. Scaling, Machine Learning and Genetic Neural Nets. Advances in Applied Mathematics, 10:137{163, 1989. 48

[79] D.J. Montana and L. Davis. Training Feedforward Neural Networks Using Genetic Algorithms. Proc. International Joint Conference on Arti cial Intelligence, pp. 762{ 767, 1989. [80] A. Parodi and P. Bonelli. A New Approach to Fuzzy Classi er System. Proc. Fifth International Conference on Genetic Algorithms, pp. 223{230, San Mateo, 1993. [81] C. Pernell, J.-M. Themlin, J.-M Renders, and M. Acheroy. Optimization of Fuzzy Expert Systems Using Genetic Algorithms and Neural Networks. IEEE Trans. on Fuzzy Systems, 3(3):300{312, 1995. [82] D. K. Pratihar, K. Deb, and A. Ghosh. A Genetic-Fuzzy Approach for Mobile Robot Navigation Among Moving Obstacles. International Journal of Approximate Reasoning, 20(2):145{172, 1999. [83] S.N. Raptis and S.G. Tzafestas. Agent-Like Neurofuzzy Architectures for Mobile Robot Path Planning. Studies Informatics and Control, 6(4):303{317, 1997. [84] P. Robbins, A. Soper, and K. Rennolls. Use of Genetic Algorithms for Optimal Topology Determination in Back Propagation Neural Networks. Proc. of the International Conference on Arti cial Neural Networks and Genetic Algorithms, pp. 726{730. Springer Verlag, 1993. [85] F. Rosenblatt. The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, 65:386{408, 1958. [86] T.A. Runkler. Selection of Appropriate Defuzzi cation Methods Using Application Speci c Properties. IEEE Trans. on Fuzzy Systems, 5(1):72{79, 1997. [87] S. Saha and J.P. Christensen. Genetic Design of Sparse Feedforward Neural Networks. Information Sciences, 79:191{200, 1994. [88] E. Sanchez. Fuzzy Genetic Algorithms in Soft Computing Environment. Proc. Fifth IFSA World Congress, Seoul, Invited Plenary Lecture, 1993. [89] A. Satyadas and K. Krishna Kumar. GA-optimized Fuzzy Controller for Spacecraft Control. Proc. of the 3rd IEEE International Conference on Fuzzy Systems, FUZZIEEE, pp. 1979{1984, Orlando, FL, 1994. [90] J.D. Scha er, R.A. Caruana, and L.J. Eshelman. Using Genetic Search to Exploit the Emergent Behaviour of Neural Networks. Physica D, 42:244{248, 1990. [91] P.K. Simpson. Fuzzy Min-Max Neural Networks-Part 1: Classi cation. IEEE Trans. on Neural Networks, 3(5):776{786, 1992. [92] P.K. Simpson. Fuzzy Min-Max Neural Networks-Part 2: Clustering. IEEE Trans. on Fuzzy Systems, 1:32{45, 1993. 49

[93] M. Sugeno and G.T. Kang. Structure Identi cation of Fuzzy Model. Fuzzy Sets and Systems, 28:15{33, 1988. [94] R.S. Sutton. Two Problems with Backpropagation and other Steepest-descent Learning Procedures for Networks. Proc. of 8th Annual Conf. of the Cognitive Science, pp. 823{831, Hillsdale, NJ, Lawrence Erlbaum Associates, 1986. [95] R.S. Sutton. Reinforcement Learning. Boston: Kluwer Academic Publishers, 1992. [96] H. Takagi and M. Lee. Neural Networks and Genetic Algorithm Approaches to AutoDesign of Fuzzy Systems. Proc. of Fuzzy Logic in Arti cial Intelligence (FLAI'93), pp. 68{79, Linz, Austria, Springer Verlag, 1993. [97] T. Takagi and I. Hayashi. NN-driven Fuzzy Reasoning. International Journal of Approximate Reasoning, 5:191{212, 1991. [98] T. Takagi and M. Sugeno. Fuzzy Identi cation of Systems and its Applications to Modeling and Control. IEEE Trans. on Systems, Man, and Cybernetics, 15:116{132, 1985. [99] D. Thierens, J. Suykens, J. Vandewalle, and B. De Moor. Genetic Weight Optimization of a Feedforward neural network controller. Proc. of the Conference on Arti cial Neural Nets and Genetic Algorithms, pp. 658{663, Springer Verlag, 1993. [100] P. Thrift. Fuzzy Logic Synthesis with Genetic Algorithms. Proc. of the Fourth Int. Conf. on Genetic Algorithms, pp. 509{513, 1991. [101] L.H. Tsoukalas and R.E. Uhrig. Fuzzy and Neural Approaches in Engineering. New York, NY: John Wiley & Sons, 1996. [102] C.S. Tzafestas and S.G. Tzafestas. Fuzzy and Neurofuzzy Approaches to Mobile Robot Path and and Motion Planning Under Uncertainty. In S.G. Tzafestas, editor, Soft Computing in Systems and Control Technology, pp. 193{220. Singapore: Word Scienti c, 1999. [103] S.G. Tzafestas. Fuzzy Systems and Fuzzy Expert Control: An Overview. The Knowledge Engineering Review, 9(3):229{268, 1994. [104] S.G. Tzafestas, F.V. Hatzivassiliou, and S.K. Kaltsounis. Fuzzy Logic Design of a Nondestructive Robotic Fruit Collector. In S.G. Tzafestas and A.N. Venetsanopoulos, editors, Fuzzy Reasoning in Information, Design and Control Systems, pp. 553{562. Dordrecht/Boston: Kluwer, 1994. [105] S.G. Tzafestas and N. Mekras. Industrial Forecasting Using Knowledge-Based Techniques and Arti cial Neural Networks. In S.G. Tzafestas, editor, Advances in Manufacturing: Decision, Control and Information Technology, pp. 171{180. Berlin/London: Springer, 1999. 50

[106] S.G. Tzafestas, S.N. Raptis, and G.B. Stamou. A Flexible Neurofuzzy Cell Structure for General Fuzzy Inference. Math. Comp. Simul., 41(3-4):219{233, 1996. [107] S.G. Tzafestas and G.G. Rigatos. Neural and Neurofuzzy FELA Adaptive Control Using Feedforward and Counterpropagation Networks. J. Intell. and Robotic Systems, 23(2-4):291{330, 1998. [108] S.G. Tzafestas, M.P. Saltouros, and M. Markaki. A Tutorial Overview of Genetic Algorithms and Their Applications. In S.G. Tzafestas, editor, Soft Computing in Systems and Control Technology, pp. 223{300. Singapore: Word Scienti c, 1999. [109] S.G. Tzafestas and G.B. Stamou. An Improved Neural Network for Fuzzy Reasoning Implementation. Math. Comp. Simul., 40(5-6):565{576, 1996. [110] S.G. Tzafestas and G.B. Stamou. Concerning Automated Assemply: KnowledgeBased Issues and a Fuzzy System for Assemply Under Uncertainty. Computer Integrated Manufacturing Systems, 10(3):183{192, 1997. [111] S.G. Tzafestas and C.S. Tzafestas. Fuzzy and Neural Intelligent: Basic Principles and Architectures. In S.G. Tzafestas, editor, Methods and Applications of Intelligent Control, pp. 25{67. Dordrecht/Boston: Kluwer, 1997. [112] S.G. Tzafestas and K.C. Zikidis. An On-Line Self-Constructing Fuzzy Modelling Architecture Based on Neural and Fuzzy Concepts and Techniques. In S.G. Tzafestas, editor, Soft Computing in Systems and Control Technology, pp. 119{168. Singapore: Word Scienti c, 1999. [113] M. Valenzuela-Rendon. The Fuzzy Classi er System: Motivations and First Results. Parallel Problem Solving from Nature II, pp. 330{334, Berlin: Springer Verlag, 1991. [114] K. Watanabe, K. Hara, S. Koga, and S.G. Tzafestas. Fuzzy-Neural Controllers Using Mean-Value Functional Reasoning. Neurocomputing, 9:39{61, 1995. [115] K. Watanabe, S.-H. Jin, and S.G. Tzafestas. Learning Multiple Fuzzy Control of Robot Manipulators. Journal of Arit cial Neural Networks, 2(1-2):119{136, 1995. [116] P.J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, Cambridge, MA, 1974. [117] B.A. Whitehead and T.D. Choate. Cooperative-Competitive GEnetic Evolution of Radial Basis Function Centers and Widths for Time Series Prediction. IEEE Trans. on Neural Networks, 7(4):869{881, 1996. [118] D. Whitley, T. Starkweather, and C. Bogart. Genetic Algorithms and Neural Networks: Optimizing Connections and Connectivity. Parallel Computing, 14:347{361, 1990. 51

[119] B. Widrow and M.A. Ho . Adaptive Switching Circuits. IRE WESCON Convention Record, pp. 96{104, 1960. [120] R.J. Williams. Toward a Theory of Reinforcement Learning Connectionist Systems. Technical Report: NU-CCS-88-3, 1988. [121] R.R. Yager and D.P. Filev. SLIDE: A Simple Adaptive Defuzzi cation method. IEEE Trans. on Fuzzy Systems, 1(1):69{78, 1993. [122] R.R. Yager and D.P. Filev. Essentials of Fuzzy Modeling and Control. New York, NY: John Wiley & Sons, 1994. [123] X.A. Yao. A Review of Evolutionary Arti cial Neural Networks. International Journal of Intelligent Systems, 8(4):539{567, 1993. [124] B. Yoon, D.J. Holmes, G. Langholz, and A. Kandel. Ecient Genetic Algorithms for Training Layered Feedforward Neural Networks. Information Sciences, 76:67{85, 1994. [125] L.A. Zadeh. Fuzzy Sets. Information and Control, 8:338{353, 1965. [126] M. Zamparelli. Genetically Trained Cellular Neural Networks. Neural Networks, 10(6):1143{1152, 1997.

52

Suggest Documents