Evolving Flexible Neural Networks Using Ant Programming and PSO Algorithm Yuehui Chen, Bo Yang, and Jiwen Dong School of Information Science and Engineering Jinan University, Jinan 250022, P.R.China
[email protected]
Abstract. A flexible neural network (FNN) is a multilayer feedforward neural network with the characteristics of: (1) overlayer connections; (2) variable activation functions for different nodes and (3) sparse connections between the nodes. A new approach for designing the FNN based on neural tree encoding is proposed in this paper. The approach employs the ant programming (AP) to evolve the architecture of the FNN and the particle swarm optimization (PSO) to optimize the parameters encoded in the neural tree. The performance and effectiveness of the proposed method are evaluated using time series prediction problems and compared with the related methods.
1
Introduction
Designing of a neural networks for a given task usually suffers from the following difficulties: (1)choosing the appropriate architecture; (2)determining the connection ways between the nodes; (3)selecting the activation functions; and (4)finding a fast convergent training algorithm. Many efforts have been made to construct a neural network for a given task in different application areas. A recent survey can be found in [8]. These approaches can be summaried as follows: ◦ Optimization of weight parameters, i.e., gradient descent method, evolutionary algorithm, tabu search, random search, etc.; ◦ Optimization of architectures, i.e., constructive and pruning algorithms, genetic algorithm, genetic programming, etc.; ◦ Simultaneously optimization of the architecture and parameters, i.e. EPNet [8], Neuroevolution [5] etc.; ◦ Optimization of learning rules. Evolving architecture and parameters of a higher order Sigma-Pi neural network based on a sparse neural tree encoding has been proposed in [9]. Where only Σ and Π neurons are used and there are no activation functions used for the neurons. A recent approach for evolving the neural tree model based on probabilistic incremental program (PIPE) and random search algorithm has been proposed in our previous work [3]. F. Yin, J. Wang, and C. Guo (Eds.): ISNN 2004, LNCS 3173, pp. 211–216, 2004. c Springer-Verlag Berlin Heidelberg 2004
212
Y. Chen, B. Yang, and J. Dong
In this paper, a new approach for evolving flexible neural network based on a neural tree encoding is proposed. The approach employs the ant programming [1] to evolve the architecture of the FNN and the particle swarm optimization to optimize the parameter encoded in the neural tree. The paper is organized as follows: Section 2 gives the encoding method and evaluation of the FNN. A hybrid learning algorithm for evolving the FNN is given in Section 3. Section 4 presents some simulation results for the time series forecasting problems. Finally in section 5 we present some concluding remarks.
2
Encoding and Evaluation
Encoding. A tree-structural based encoding method with specific instruction set is selected for representing a FNN in this research. The reason for choosing this representation is that the tree can be created and evolved using the existing or modified tree-structure-based approaches, i.e., Genetic Programming (GP), PIPE algorithms, Ant Programming (AP). The used instruction set for generating the FNN tree is as follows, I = {+2 , +3 , . . . , +N , x1 , x2 , . . . , xn },
(1)
where +i (i = 2, 3, . . . , N ) denote non-leaf nodes’ instructions and taking i arguments. x1 ,x2 ,. . .,xn are leaf nodes’ instructions and taking no argument each. In addition, the output of each non-leaf node is calculated as a single neuron model. For this reason the non-leaf node +i is also called a i-inputs neuron instruction/operator. Fig.1 (left) shows tree structural representation of a FNN. In the creation process of neural tree, if a nonterminal instruction, i.e., +i (i = 2, 3, 4, . . . , N ) is selected, i real values are randomly generated and used for representing the connection strength between the node +i and its children. In addition, two parameters ai and bi are randomly created as flexible activation function parameters and attach them to node +i . In this study the used flexible activation function is described as −(
f (ai , bi , x) = e
x−ai 2 bi )
.
(2)
Evaluation. The output of a FNN can be calculated in a recursive way. For any nonterminal node, i.e., +i , the total excitation is calculated as neti =
i
wj ∗ yj
(3)
j=1
where yj (j = 1, 2, . . . , i) are the input to node +i . The output of the node +i is then calculated by −(
outi = f (ai , bi , neti ) = e
neti −ai 2 ) bi
.
(4)
Thus, the overall output of flexible neural tree can be computed from left to right by depth-first method, recursively.
Evolving Flexible Neural Networks Using Ant Programming
+2 : 0.8 +3 : 0.5 +4 : 0.3 +5 : 0.2 +6 : 0.1
+6 +3 x0
x1
+2
x2
+3
f(a,b)
ω0
ω1 ω2
x0
x1
x2
x0 x1 x2
Free parameters:
+3
x0
x1
x0 x1
x2
+2 x2
x2
x1
+3
213
ω0 ω1 ω2 a b
: 0.1 : 0.2 : 0.1
x0 x1 x2
Fig. 1. Left: Tree-structural representation of an example FNN with instruction set I = {+2 , . . . , +6 , x0 , x1 , x2 }. Right: Pheromone tree, in each node a pheromone table holds the quantity of pheromone associated with all possible instructions.
Objective function. In this work, the fitness function used for AP and PSO is given by mean square error (MSE): F it(i) =
P 1 j (y − y2j )2 P j=1 1
(5)
where P is the total number of training samples, y1j and y2j are the actual and model outputs of j-th sample. F it(i) denotes the fitness value of i-th individual.
3
An Approach for Evolving a FNN
Ant Programming for evolving the architecture of FNN. Ant programming is a new method which applies the principle of the ant systems to automated program synthesis [1]. In the AP algorithm, each ant will build and modify the trees according to the quantity of pheromone at each node. The pheromone table appears as a tree. Each node owns a table which memorize the rate of pheromone to various possible instructions (Fig.1 (right)). First, a population of programs are generated randomly. The table of pheromone at each node is initialized at 0.5. This means that the probability of choosing each terminal and function is equal initially. The higher the rate of pheromone, the higher the probability to be chosen. Each program (individual) is then evaluated using a predefined objective function. The table of pheromone is update by two mechanisms: – 1. Evaporation decreases the rate of pheromone table for every instruction on every node according to following formula : Pg = (1 − α)Pg−1
(6)
where Pg denotes the pheromone value at the generation g, α is a constant (α = 0.15).
214
Y. Chen, B. Yang, and J. Dong
– 2. For each tree, the components of the tree will be reinforced according to the fitness of the tree. The formula is: α Pi,si = Pi,si + (7) F it(s) where s is a solution (tree), F it(s) its fitness, si the function or the terminal set at node i in this individual, α is a constant (α = 0.1), Pi,si is the value of the pheromone for the instruction si in the node i. A brief description of AP algorithm is as follows:(1) every component of the pheromone tree is set to an average value; (2) random generation of tree based on the pheromone tree; (3) evaluation of ants using Eqn.(5); (4) update of the pheromone table according to Eqn.(6) and Eqn.(7); (5) go to step (1) unless some criteria is satisfied. Parameter optimization with PSO. For the parameters optimization of FNN, a number of global and local search algorithms, i.e., GA, EP, gradient based learning method can be employed. The basic PSO algorithm is selected for parameter optimization due to its fast convergence and ease to implementation. The PSO [6] conducts searches using a population of particles which correspond to individuals in evolutionary algorithm (EA). A population of particles is randomly generated initially. Each particle represents a potential solution and has a position represented by a position vector xi . A swarm of particles moves through the problem space, with the moving velocity of each particle represented by a velocity vector vi . At each time step, a function fi (Eqn.(5) in this study) representing a quality measure is calculated by using xi as input. Each particle keeps track of its own best position, which is associated with the best fitness it has achieved so far in a vector pi . Furthermore, the best position among all the particles obtained so far in the population is kept track of as pg . In addition to this global version, another version of PSO keeps track of the best position among all the topological neighbors of a particle. At each time step t, by using the individual best position, pi (t), and the global best position, pg (t), a new velocity for particle i is updated by vi (t + 1) = vi (t) + c1 φ1 (pi (t) − xi (t)) + c2 φ2 (pg (t) − xi (t))
(8)
where c1 and c2 are positive constant and φ1 and φ2 are uniformly distributed random number in [0,1]. The term vi is limited to the range of ±vmax . If the velocity violates this limit, it is set to its proper limit. Changing velocity this way enables the particle i to search around its individual best position, pi , and global best position, pg . Based on the updated velocities, each particle changes its position according to the following equation: xi (t + 1) = xi (t) + vi (t + 1).
(9)
The proposed learning algorithm. The general learning procedure for the optimal design of the FNN can be described as follows. 1) Create the initial population randomly (FNNs and their corresponding parameters);
Evolving Flexible Neural Networks Using Ant Programming +5
+4
+2 x0
x1
x1
x9
+2
+2 x0
215
x1
+2 +2 x1
x0
x0 x1
x7
+2
+3
x9
x3
x6
x6
x8
x1
Fig. 2. Case 1: The structure of optimized FNN for prediction of Box and Jenkins data with 2-inputs (left), and with 10-inputs (right).
2) Structure optimization by AP algorithm. 3) If the better structure is found, then go to step 4), otherwise go to step 2); 4) Parameter optimization by PSO algorithm. In this stage, the tree structure is fixed, and it is the best tree taken from the end of run of the structure search. All of the parameters encoded in the best tree formulated a parameter vector to be optimized by PSO; 5) If the maximum number of PSO search is reached, or no better parameter vector is found for a significantly long time (say 100 steps for maximum 2000 steps) then go to step 6); otherwise go to step 4); 6) If satisfied solution is found, then stop; otherwise go to step 2).
4
Case Studies
Developed the FNN is applied to a time-series prediction problem: Box and Jenkins time series [2]. The used population sizes for AP and PSO are 150 and 30, respectively. Case 1. The inputs of the prediction model are u(t − 4) and y(t − 1), and the output is y(t). 200 data samples are used for training and the remaining data samples are used for testing the performance of the evolved model. The used instruction sets for creating the FNN model is I = {+2 , . . . , +8 , x0 , x1 }. Where x0 and x1 denotes the input variables u(t − 4) and y(t − 1), respectively. The results were obtained from training of the FNN with 20 different experiments. The average M SE value for training and test data sets are 0.000680 and 0.000701, respectively. The optimal structure of the evolved FNN is shown in Fig.2 (left). Case 2. For the second simulation, 10 inputs variables are used for constructing the FNN in order to test the input-selection ability of the algorithm. The used instruction sets for creating the FNN is I = {+2 , . . . , +8 , x0 , x1 , . . . , x9 }. Where xi (i = 0, 1, . . . , 9) denotes u(t − 6), u(t − 5), u(t − 4), u(t − 3), u(t − 2), u(t − 1) and y(t − 1), y(t − 2), y(t − 3), y(t − 4), respectively. The average M SE value for training and test data sets are 0.000291 and 0.000305, respectively. The optimal structure of FNN is shown in Fig. 2 (right). From the figure, it is can be seen that the FNN model has some capability to
216
Y. Chen, B. Yang, and J. Dong Table 1. Comparative results of different modelling approaches
Model name and reference
Number of inputs
M SE
ARMA [2] FuNN model [7] ANFIS model [4] Case 1 Case 2
5 2 2 2 10
0.71 0.0051 0.0073 0.00068 0.00029
select the associated input variables to construct the FNN model. A comparison result of different methods for forecasting Jikens-Box data is shown in Table 1.
5
Conclusion
In this paper, a FNN model and its design method were proposed. A combined approach of AP to evolving the architecture and PSO to optimize the free parameters encoded in the neural tree was developed. Simulation results on time series prediction problem shown the feasibility and effectiveness of the proposed method. It should be noted that other tree-structure-based evolutionary algorithm and parameter optimization algorithm can also be employed to accomplish the same tasks.
References 1. Birattari, M., Di Caro, G., and Dorigo M. : Toward the formal foundation of Ant Programming. In Third International workshop, ANTS2002. LNCS 2463, 2002 188201 2. Box, G. E. P. : Time series analysis, forecasting and control. San Francisco Holden Day (1970) 3. Chen, Y., Yang, B., Dong, J. Nonlinear systems modelling via optimal design of neural trees. International Journal of Neural ystems. 14, (2004) 125-138 4. Jang, J.S. : (1997) Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence. Upper saddle River, NJ:prentice-Hall (1997) 5. Kenneth O. S., and Risto Miikkulainen: Evolving neural networks through augmenting topologies. Evolutionary Computation. 10, (2002) 99-127 6. Kennedy, J. et al.,: Particle Swarm Optimization. Proc. of IEEE International Conference on Neural Networks. IV (1995) 1942-1948 7. Kasabov, K. et al., : FuNN/2 - A fuzzy neural network architecture for adaptive learning and knowledge acquisition. Information Science 101 (1997) 155-175 8. Yao, X.: Evolving artificial neural networks. Proceedings of the IEEE. 87, (1999) 1423-1447 9. Zhang B.T., et al.: Evolutionary induction of sparse neural trees. Evolutionary Computation. 5, (1997) 213-236