A Hybrid Multi-Objective Evolutionary Algorithm Using an Inverse Neural Network Ant´onio Gaspar-Cunha1 and Armando Vieira2 Abstract. We present two methods to accelerate the search of a Multi-Objective Evolutionary Algorithm (MOEA) using Arti£cial Neural Networks (ANN) to approximate the £tness functions. This approach can substantially reduce the number of £tness evaluations on computational expensive problems without compromising the good search capabilities of MOEA. In one method the ANN is used to approximate the global £tness functions and in the second the ANN is applied in a local search strategy to discover better individuals from previous generations. The ef£ciency of both methods is tested on several benchmark functions.
1 INTRODUCTION Multi-Objective Evolutionary Algorithms (MOEAs) are ef£cient methods to evaluate the Pareto-optimal set in dif£cult multiobjective optimization problems, such as linear programming and combinatorial optimization. Several MOEA have been suggested that are capable to deal with a population of points to de£ne an approximation to the Pareto set with a single run [1] – [7]. One of the major dif£culties in applying evolutionary algorithms to real problems is the large number of evaluations of the objective functions necessary to obtain an acceptable solution - typically several thousands. Often these are time-consuming evaluations obtained by solving numerical codes with expensive methods like £nite-differences or £nite-elements. Reducing the number of evaluations necessary to reach an acceptable solution is thus of major importance [8], [9]. Finding good approximate methods is even harder for multi-objective problems due to different con¤icting criteria and the possible interaction between them. The use of approximate models in Evolutionary Algorithms (EA) has received little attention, particularly for multiobjective optimization. Several approaches can be used to approximate objective functions, such as statistical methods or Arti£cial Neural Networks (ANN). Recently Poloni et al. [10] combined several optimization techniques in a multiobjective optimization problem. Nain and Deb [8] suggested a different approach to combine GA and ANN. Jin et al. [9] proposed a method similar to the previous authors but implementing an adaptive strategy to select between the use of exact or approximate functions evaluations. This work presents a further integration of ANN in MOEA by proposing two different hybrid approaches to estimate the functions or the solutions used by a Multi-Objective Evolutionary Algorithm, namely the Reduced Pareto Set Genetic Algorithm (RPSGA) [3], 1
IPC - Institute for Polymers and Composits, Dept. of Polymer Engineering, University of Minho, Campus de Azur´em, 4800-058 Guimar˜aes, Portugal email:
[email protected] 2 Dept. of Physics, Instituto Superior de Engenharia do Porto, R. S. Tome, 4200 Porto, Portugal email:
[email protected]
[11]. In the £rst approach, the algorithm controls dynamically the number of exact and approximate function evaluations through the error introduced by the approximation in each generation. In the second approach tentative individuals are generated and tested by the ANN before being presented to the genetic algorithm.
2 MULTIOBJECTIVE OPTIMISATION ALGORITHM The capabilities of Evolutionary Algorithms to explore and discover Pareto-optimal fronts on multi-objective optimization problems have been well recognized. It has been showed that MOEA outperform traditional deterministic methods to this type of problems due to its capacity to explore and combine various solutions to £nd the Pareto front in a single run. For this problem, the Multi-Objective Evolutionary Algorithm should provide an homogeneous distribution of population along the Pareto frontier together with an improvement of the solutions along successive generations [3], [4], [5]. In this work the Reduced Pareto Set Genetic Algorithm (RPSGA) is adopted [11], where a clustering technique is applied to reduce the number of solutions on the ef£cient frontier. Initially, RPSGAe sorts the population individuals in a number of pre-de£ned ranks using a clustering technique, in order to reduce the number of solutions on the ef£cient frontier while maintaining it characteristics intact. Then, the individuals’ £tness is calculated through a ranking function. To incorporate this technique, the traditional GA was modi£ed as follows [11], [12]: 1. Random initial population (of size N ) 2. Empty external population (of size Ne ); Ncount = 0 3. while not Stop-Condition do (a) Evaluate internal population (b) Calculate the Fitness of the individuals using the RPSGA (c) Copy the best N 0 individuals to the external population (d) Ncount = Ncount + N 0 (e) if (Ncount > Ne ) • Apply the RPSGA to this population • Copy the best individuals to the internal population • Ncount = 0 (f) end if (g) Select the individuals for reproduction (h) Crossover (i) Mutation 4. end while
The algorithm follows the steps of a traditional GA except in the existence of an external (elitist) population and in speci£c £tness evaluation. Initially, an internal population of size N is randomly de£ned and an empty external population formed. At each generation a £xed number of best individuals, obtained by reducing the internal population with the clustering algorithm [12], are copied to an external population. This process is repeated until the number of individuals of the external population becomes full. Then, the RPSGA is applied to sort the individuals of the external population, and a pre-de£ned number of the best individuals are incorporated in the internal population by replacing lowest £tness individuals. Detailed information about this algorithm can be found elsewhere [11], [12].
Pa r a m e t e r s to o p tim is e
Cr i t e r i a
P
C
P
C
3 NEURAL NETWORKS
...
...
Feedforward Arti£cial Neural Networks (ANN), implemented by a Multilayer Perceptron, are ¤exible schemes capable to approximate an arbitrary complex function, provide that enough training data is available [13], [14]. An ANN basically builds a map between a set of inputs and the respective outputs (see Figure 1) and are particularly well suited to non-linear regression analysis of noisy signals and incomplete data. The crucial points in the construction of an ANN are the selection of inputs and outputs, the architecture of the ANN, that is, the number of layers and of nodes in each layer, the minimization criterion and the training algorithm used. Combining ANN with Evolutionary Algorithms is a powerful approach to address the exploitation/exploration dilemma. By constructing a smooth mapping, the ANN are adequate to local search adequately exploit speci£c regions for candidate solutions. On the other hand, Evolutionary Algorithms are ef£cient in exploring huge search spaces of multivariate functions with many local minima. They are adequate to global search. The goal of this work is to implement an approximation to the £tness functions for multi-objective optimization independent of the criteria to be optimized and the parameters of the Evolutionary Algorithm. In the present case, the training data is composed of previous exact function evaluations, performed by the evolutionary algorithm.
P
C
Figure 1.
Scheme of the ANN used
exactly. The performance of this method depends on the ANN used and on p and q, i.e., the number of generations in which exact and approximate functions values were used. The accuracy of the ANN depends of the number of p generations that contains the individuals used for training. A large p is desirable to have a better ANN approximation. Conversely, by increasing q we reduce the number of exact evaluations at the cost of deteriorating the Pareto-front. Although these two parameters can be pre-de£ned, their determination is strongly problem dependent. In Method A2 all N individuals of the population are evaluated with the ANN approximation but a small portion, M , are simultaneously evaluated with the exact function. This fraction is used to monitor the accuracy of the approximation. The error introduced by the approximations (e N N ) is:
4 ALGORITHMS PROPOSED The goal of our multi-objective hybrid algorithm is to reduce the number of evaluations of the exact objective functions without loosing performance on establishing the Pareto-front. Several methods are presented: two of them using approximate solutions given by properly trained ANN, that we call method A1 and method A2. The third approach (named Method B) uses a ANN trained in a reverse way, i.e., the input layer represents the criteria and the output layer the parameters to be optimized. Our aim is to perform a local search, starting from some of non-dominated solutions of the previous generation, in order to obtain better solutions for each new generation.
eN N =
M P
j=1
s
R P
i=1
¡
N N −C Ci,j i,j
M
R
¢2
(1)
where R is the number of criteria, M the number of solutions exNN is the value of criteria i for solution j evaluated actly evaluated, Ci,j by ANN and Ci,j is the value of criteria i for solution j evaluated by exact function. As the algorithm evolves it drift to regions outside the domain covered by the initial training points where the neural network approximation is no longer valid. Therefore, parameter q can be dynamically adjusted as the number of generations for which the following inequality holds:
4.1 Global approximation algorithm Method A1 is similar to the approach proposed by Nain and Deb [8], the difference being the MOEA used, and it was here implemented for comparison purposes. The Evolutionary Algorithm runs over p initial generations to obtain a £rst set of evaluations necessary to train the neural network. From this point forward, method A1 uses the ANN to evaluate all the solutions during a number of consecutive, and £xed, q generations - Figure 2. This process is repeated until an acceptable solution is found being the last p generations evaluated
eN N < e 0
(2)
being e0 a value £xed by the user or adapted over the run and measures the level of discrepancy desired, or admitted, by the user. 26
" ! " # ! $ %& !
'
) " ! " # ! !
) " ! " # ! !
Figure 2.
...
...
c
∆C+
( !
a
b e1
Criteria 2
" ! " # ! $ %& !
1
2 3 4
Schematic structure of the algorithm identi£ed as method A1
e2
4.2 Local approximation algorithm
Criteria 1
In Method B an inverse neural network (IANN) is used to perform a local search around a set of non-dominated points from the precedent generation. Two additional steps are included in Algorithm 1: i) after procedure 3b) where the IANN is trained using points from the present generation; ii) after step 3c) a local search is performed by IANN. Individuals for subsequent generations are obtained by the usual EA operators (crossover and mutation) and by a local search operator as well. Figure 3 illustrates the local search operator for a problem with two criteria. First we select the best individuals of the present generation with RPSGA. For extreme points, designated in Figure 3 by ”e1” and corresponding to the minimum found so far for criterion 1, and ”e2”, the minimum found for criterion 2, three new individuals are obtained with IANN. These new points, identi£ed as ”a”, ”b” and ”c”, are generated as follows: Point a) : Point b) : Point c) :
~ new = (C1 + ∆C1 , C2 + ∆C2 ) C ~ C new = (C1 , C2 + ∆C2 ) ~ new = (C1 − ∆C1 , C2 ) C
Figure 3.
Scheme used for the Inverse ANN local approximation (Method B)
Table 1.
Test functions used.
Name
M
Equation n.
xi
ZDT1 ZDT2 ZDT3 ZDT4 ZDT6
30 30 30 10 10
5 6 7 8 9
xi ∈ [0,1] xi ∈ [0,1] xi ∈ [0,1] x1 ∈ [0,1]; xi ∈ [-5,5] xi ∈ [0,1]
f1 (x1 ) = x1
³
f2 (x2 , · · · , xm ) = g × 1 −
(3)
q
where, g (x2 , · · · , xm ) = 1 + 9
~ new where ∆Cj is the displacement applied to each criteria and C sets the coordinates of criteria corresponding to the new point. The indices 1 stands for point ”e1”, and 2 for point ”e2”. For the remaining points 1 to 4 the following equation applies: ~ new = (C1 + ∆C1 , C2 + ∆C2 ) C
∆C*
f1 (x1 ) = x1 f2 (x2 , · · · , xm ) = g ×
µ
³
f1/ Pg
m
i=2
(4)
(5) xi
m−1
1 − f1/g
where, g (x2 , · · · , xm ) = 1 + 9
´
´2 ¶
Pm
i=2
(6)
xi
m−1
f1 (x1 ) = x1
~ new in the criteria space, the correspondent For each new point C individual in the input variables space is obtained by the inverse neural network. The perturbation ∆Cj must be carefully selected, being the optimal values found in the interval [0.3, 0.4] - re-member that all criteria were normalized to the interval [0, 1].
³
f2 (x2 , · · · , xm ) = g × 1 −
q
where, g (x2 , · · · , xm ) = 1 + 9
´
³
f1/ − f1/ sin (10πf1 ) g Pg m
xi
i=2
´
m−1
(7)
5 APPLICATION TO BENCHMARK PROBLEMS
f1 (x1 ) = x1
In this section we apply all methods to several benchmark problems.
where, ¢ Pm ¡ g (x2 , · · · , xm ) = 1 + 10 (m − 1) + i=2 x2i − 10 cos (4πxi ) (8)
³
f2 (x2 , · · · , xm ) = g × 1 −
5.1 The test problems
q
f1/ g
´
6 f1 (x1 ) = 1 − exp(−4xµ 1 ) sin (6πx1 ) ¶
To test the method we used the ZDT1, ZDT2, ZDT3, ZDT4 and ZDT6 bi-objective functions - see Table 1 and equations 5 to 9. These functions cover various types of Pareto-optimal fronts, such as convex (ZDT1), non-convex (ZDT2), discrete (ZDT3), multimodal (ZDT4) and non-uniform (ZDT6) [15].
f2 (x2 , · · · , xm ) = g ×
³
1 − f1/g
where, g (x2 , · · · , xm ) = 1 + 9
27
´2
µ Pm
i=2
m−1
(9) xi
¶0.25
1
The ef£ciency of the methods should take into account the balance between the quality of the solutions obtained and the number of exact objective function evaluations. Since the computation time required to train and test the neural network is negligible, when computational demanding problems are to be solved, the number of exact evaluations of the objective functions was used as the signi£cant running parameter. Performance was compared over successive generations. However, measuring the quality of solutions in multiobjective optimization is dif£cult due to the presence of multiple, often con¤icting, goals. Some of these goals are: archiving a good approximation to the Pareto-front, having a uniform distribution of the solutions and maximizing the coverage of the Pareto-front. The S-metric proposed by Zitzler [16] will be adopted, since it is adequate for problems with few objective dimensions [17]. Each run was performed £ve times to take into account the variation of the random initial population. The RPSGA population size was set to 100 over 300 generations and a roulette wheel selection strategy, a crossover probability of 0.8, a mutation probability of 0.05, a number of ranks of 30 and the limits of indifference of the clustering technique of 0.01 were used [11].
M e th o d A 1
Sm e t r ic
0. 8
w ith o u t A N N
0. 6 0. 4 0. 2 0
Figure 4.
0
5000
10000 15000 20000 No. of e x a c t e v a l u a t i on s
Comparison of S-metric with and without ANN approximation for the ZDT1 function
5.3 Results for method A2 In order to achieve a clear comparison of the performance of our method the following criterion is used:
5.2 Results for Method A1 S∗ = The relevance of some parameters on the algorithm performance were initially studied: number of generations evaluated by the exact function, (p = 5, 10 and 15 generations), number of generations evaluated by the approximate model, (q = 5, 10 and 15 generations), number of hidden neurons of the neural network, (Nh = 10, 20 and 30 neurons), learning rate of the neural network, (η = 0.1, 0.2, 0.3 and 0.4) and momentum term, (α = 0.2, 0.4, 0.6 and 0.8). For each generation the algorithm calculates the average of the S-metric over the 5 runs performed for each case as a function of the number of exact evaluations effectuated so far. The best results obtained for each test problem were compared with the ones obtained using RPSGA alone. Table 2 presents the best parameters found for the test functions. Problems ZDT1 and ZDT6 are easier for the ANN to £nd a good approximation since q is large and Nh small. Problem ZDT3 should be the hardest. Table 2.
P
q
Nh
η
α
ZDT1 ZDT2 ZDT3 ZDT4 ZDT6
15 15 15 15 15
15 15 5 5 15
10 30 20 10 10
0.2 0.3 0.3 0.2 0.2
0.2 0.6 0.2 0.6 0.3
S¯N N − S¯ S¯N N
(10)
where, S¯N N and S¯ are the averages of the S-metric obtained with and without ANN, respectively. In Figure 5 both S∗ and the difference of exact evaluations as a function of the number of generations are plotted. The difference of exact evaluations can be obtained using Equation 5 by replacing the averages of the S-metric with the averages of the number of exact evaluations for the same runs. For method A1 and S* = 1%, the number of exact evaluations was reduced by 35% - Figure 5. The number of exact evaluations to reach the same S-metric is also reduced to about 28%, as for method A2 - Figure 5. However, method A2 has the advantage that no parameter optimization is needed and therefore results are obtained in a single run. Similar results were achieved for different levels of allowed error e0 . Table 3 presents a summary of the results obtained on the remaining test problems. Since ZDT2 is a convex problem it was harder to solve. Identical results were obtained for ZDT3 problem, which is remarkable given the discontinuous nature of this problem. The multimodal characteristic of the ZDT4 problem present an additional dif£culty due to the fact that different initial populations, i.e., different seed values, produce different Pareto fronts. As expected, the results obtained for the ZDT6 problem are weak due to the non-uniform nature of this problem.
Best algorithm parameters for method A1.
Test problem
25000
Table 3.
Figure 4 compares the results obtained with traditional RPSGA and the best results obtained with method A1. Note that, for a given number of exact evaluations, the S-metric is always superior when the approximate method is used. This is particularly visible in the early convergence where the improvements are larger. After about 10000 exact evaluations both curves merge to a single plateau asymptotically approaching the £nal solution. This method can save from 9% to 35% of exact function evaluations - see also Table 3. 28
Reduction in the number of evaluations using methods A1 and A2. Test problem
Method A1
Method A2
ZDT1 ZDT2 ZDT3 ZDT4 ZDT6
35% 25% 23% 25% 9%
28% 25% 23% 10% 9%
+-,./ *
)*
!#"
METHOD A1
Sm e t r ic
(!#"
METHOD A2
0
5000
$&% ' !#"
Figure 7.
10000 15000 20000 N. o f e x a c t e v a l u a t i o n s
25000
Comparison of the S metric as a function of the number of evaluations for ZDT2 problem (Method B)
optimal-front, in all of the £ve runs carried out. The big differences, on the S metric, are due to the discrete steps existent when the optimization algorithm passes trough the various (multimodal) optimal fronts. Finally, the results produced by Method B, when applied to the dif£cult non-uniform ZDT6 problem (Figure 10), clearly outperform the approach without ANN.
Evolution of the S metric and number of evaluations differences for ZDT1 test problem, using methods A1 and A2
2.6 Sm e t r ic
Figures 6 to 10 present a comparison between Method B and the results obtained without the use of the ANN approximation, for the £ve test problems studied (Table 1). These values were obtained by averaging over 5 runs using different initial populations. The lines represent the evolution of the S metric as a function of the number of exact evaluations.
M e th o d B
2.1 1.6
W ith o u t N N
1.1 0.6 0.1
Z D T 1
1
Z D T 3
3 .1
5.4 Results for method B
0
5000
M e th o d B
0.8 Sm e t r ic
W ith o u t N N
0.1
Figure 8.
0.6
10000 15000 20000 N. o f e x a c t e v a l u a t i o n s
25000
Comparison of the S metric as a function of the number of evaluations for ZDT3 problem (Method B)
W ith o u t N N
0.4
Z D T 4
0. 2 0
5000
10000 15000 20000 N. o f e x a c t e v a l u a t i o n s
0. 1 6
25000 Sm e t r ic
0.2
0.5 0.3
Figure 5.
M e th o d B
0.7
Z D T 2
0.9
$&% ' !#"
Figure 6. Comparison of the S metric as a function of the number of evaluations for ZDT1 problem (Method B)
M e th o d B
0. 1 2 0. 08
W ith o u t N N
0. 04 0
Method B always outperform Methods A1 and A2 and the approach without ANN in producing a better and faster approximation to the optimal Pareto front. In the case of the functions ZDT1, ZDT2 and ZDT3, Method B is able to reach identical value of the S-metric after about 60% of the exact evaluations required without any ANN approximation. These differences are superior when Method B is applied to the problems ZDT4 and ZDT6. In the case of the multimodal problem (ZDT4, Figure 9) Method B is able to reach a better Pareto
Figure 9.
29
0
5000
1 0000 1 5000 20000 N. o f e x a c t e v a l u a t i o n s
25000
Comparison of the S metric as a function of the number of evaluations for ZDT4 problem (Method B)
ZDT6
0. 7
[10] C. Poloni, A. Giurgevich, L. Onesti, V. Pedirola, Hybridization of a multi-objective genetic algorithm, a neural network and a classical optimizer for complex design problem in ¤uid dynamics, Computer Methods in Applied Mechanics and Engineering 186 (2000)403-420. [11] A. Gaspar-Cunha, J.A. Covas, J.A. - RPSGAe - A Multiobjective Genetic Algorithm with Elitism: Application to Poly-mer Extrusion in a Lecture Notes in Economics and Mathematical Systems volume, Springer (2002). [12] A. Gaspar-Cunha, Modelling and Optimization of Single Screw Extrusion, PhD Thesis, University of Minho, Guimares, Portugal, 2000 (Downloadable from website http://www.lania.mx/ ccoello/EMOO/). [13] C. Bishop Neural Networks for Pattern Recognition, Oxford University Press, 1997. [14] L. Bull, On Model-based Evolutionary Computation, Soft. Comp, (3)(1999) 76. [15] E. Zitzler, K. Deb, L. Thiele, Comparison of Multiobjective Evolutionary Algorithms: Empirical Results, Evolutionary Computation, (8)(2000)173-195. [16] E. Zitzler, E.: Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications, PhD Thesis, Swiss Federal Institute of Technology (ETH), Zurich, Switzerland, 1999. [17] J.D. Knowles, D.W. Corne, On Metrics for Comparing Non-Dominated Sets. In Proceedings of the 2002 Congress on Evolutionary Computation Conference (CEC02), IEEE Press (2002)711-716.
0. 6 Sm e t r ic
0. 5
M e thod B
0. 4
Without N N
0. 3 0. 2 0. 1 0
Figure 10.
0
5000
1 0000 1 5000 20000 N . o f e x a c t e v a lu a tio n s
25000
Comparison of the S metric as a function of the number of evaluations for ZDT6 problem
6 CONCLUSIONS We have shown that Arti£cial Neuronal Networks can be effectively used to approximate objective function for multi-objective optimization with evolutionary algorithms. The ef£ciency of this approach is strongly dependent on the functions to be optimized and the degree of approximation required, but in general it helps reducing the large number of evaluations required by evolutionary algorithms. Two approaches were proposed: a global (Method A1 and A2) and a local approximation (Method B). We found that Method A2 can save a computational time ranging from 13% to about 30% in several benchmark problems. Method B can save up to 40% of the exact function evaluations while achieving better Pareto-fronts.
ACKNOWLEDGEMENTS This work was supported by the Portuguese Fundac¸a˜ o para a Ciˆencia e Tecnologia under grant POCTI/EME/48448/2002.
REFERENCES [1] J.D. Schafer, Some Experiments in Machine Learning Using Vector Evaluated Genetic Algorithms, Ph. D. Thesis, Nash-ville, TN, Vanderbilt University, 1984. [2] C.M. Fonseca, P.J. Fleming, Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generali-zation, Proc. Fifth Int. Conf. on Genetic Algorithms, Morgan Kauffman (1993)416-423. [3] A. Gaspar-Cunha, P. Oliveira, J.A. Covas, Use of Genetic Algorithms in Multicriteria Optimization to Solve Industrial Problems, Seventh Int. Conf. on Genetic Algorithms, Michigan, USA, 1997. [4] K. Deb, S. Agrawal, A. Pratap, T. Meyarivan, A Fast Elitist Nondominated Sorting Genetic Algorithm for Multi-Objective Optimization: NSGAII, Proceedings of the Parallel Problem Solving from Nature VI (PPSNVI), (2000)849-858. [5] K. Deb, Multi-Objective Optimization using Evolutionary Algorithms, Wiley, 2001. [6] E. Zitzler, M. Laumanns, L. Thiele, SPEA2: Improving the Strength Pareto Evolutionary Algorithm, TIK report no. 103, Swiss Federal Institute of Technology, Zrich, Switzerland, 2001 (Downloadable from website http://www.tik.ee.ethz.ch/ zitzler/). [7] J.D. Knowles, D.W. Corne, Approximating the Non-dominated Front using the Pareto Archived Evolutionary Strategy, Evolutionary Computation Journal, (8)(2000)149-172. [8] P.K.S. Nain, K. Deb, A Computationally Effective Multi-Objective Search and Optimization Technique Using Coarse-to-Fine Grain Modelling, Kangal Report No. 2002005, 2002 (Downloadable from website http://www.iitk.ac.in/kangal/deb.htm). [9] Y. Jin, M. Olhofer, B. Sendhof, A Framework for Evolutionary Optimization with Approximate Fitness Functions, IEEE Trans. on Evolutionary Computations (6)(2002)481-494.
30