Combining Exploration and Reliability in Coevolution

0 downloads 0 Views 112KB Size Report
without making any progress in the long run [5, 18]. This Red Queen problem can be understood by recognizing that there may be multiple objectives underlying.
Combining Exploration and Reliability in Coevolution Edwin D. de Jong

a

Universiteit Utrecht, ICS, DSS Group P.O. Box 80.089, 3508 TB Utrecht, The Netherlands. [email protected] http://www.cs.uu.nl/~dejong a

Abstract Coevolution can in principle circumvent the difficult problem of designing a fitness function; this may remove harmful biases, and thereby improve search performance. Recently, based on Evolutionary Multi-Objective Optimization, the feasibility of accurate evaluation in coevolution has been demonstrated theoretically. A current challenge in coevolution is how this theoretical promise may be translated into practical algorithms. We investigate a setup consisting of multiple subpopulations, each with a varying degree of exploration. It is shown that by allowing these subpopulations to interact, the desirable aims of exploration and reliability can be combined.

1

Introduction

Coevolutionary algorithms [2, 1, 13] can address problems where the quality of an individual is determined by its outcomes in several tests. This description defines the broad class of test-based problems, ranging from supervised learning (where a learning example is a test) to board games such as chess or Go, where each opponent can be viewed as a test. The number of possible tests in such problems is typically very large, and evaluating an individual on all possible tests is therefore generally infeasible. Most evolutionary algorithms therefore require the user of the algorithm to specify a fitness function to be used for evaluation purposes. However, as the ideal evaluation function for test-based problems is generally unknown, choices in defining such a fitness function can greatly influence the performance of the algorithm employing it. In coevolution, the problem of choosing a fitness function is avoided; rather than having user-defined evaluations along with the biases these are likely to introduce, coevolutionary algorithms evaluate individuals based on interactions with other evolving individuals. Thus, the examples in supervised learning or the opponents in chess are themselves evolving individuals, and the coevolutionary process decides which tests to use for evaluating other individuals. While coevolution may avoid certain biases that are due to inaccurately determined fitness functions, an important question is to what extent an evolved set of tests can provide accurate evaluation. Indeed, apart from a number of tantalizing

successful results [13, 16, 15], coevolutionary methods are notorious for their unreliability. A particular problem is that it has long been unclear how the evolving tests should be evaluated; if these are set up in symmetric competition with the learning individuals, there is no guarantee that all underlying objectives of the individuals will be tested. As an example, the members of two competing populations of chess players may continue to adapt to individuals in the other population, without making any progress in the long run [5, 18]. This Red Queen problem can be understood by recognizing that there may be multiple objectives underlying performance in a problem, such as strategy and tactics in chess, or fuel-efficiency, esthetics and costs in car design. If individuals are not tested on all underlying objectives of a problem, progress in one objective will often be accompanied by undetected regress for the untested objectives. Recently, based on Evolutionary Multi-Objective Optimization (EMOO) [9], the feasibility of accurate coevolutionary evaluation has been theoretically confirmed. It was proven that for any set of learners, a small set of tests exists that provides all information relevant to the evaluation of the learners [6, 7, 8]. Moreover, this Complete Evaluation Set can be approximated by practical algorithms, as it provides an operational criterion for the evaluation of the tests; by evolving tests according to this criterion, approximation of the Complete Evalution Set can be guaranteed in the limit [8]. Another approach to the theoretical study of coevolution is based on order theory; using this formalism, a set related to the Complete Evaluation Set has been described [3], and geometrical aspects of coevolution are investigated [4]. A current challenge in coevolution is how the theoretical promise of accurate evaluation without a user-defined fitness function may be translated into practical algorithms. A first algorithm based on approximating the Complete Evaluation Set is the delphi algorithm. By making strict choices regarding the replacement of individuals, this algorithm is able to achieve sustained progress in all underlying objectives for several difficult test problems [7]. In order to apply theoretically justified coevolutionary algorithms in practice, a number of hurdles must be overcome. Perhaps the most important limitation resulting from strict replacement criteria is that it may reduce the potential for exploration. In this paper we focus on this issue by presenting and demonstrating a novel technique that combines exploration and reliability. The technique employs a number of subpopulations. Inspired by the method of parallel tempering [12], each subpopulation has a different temperature, which controls its degree of exploration. Since the parents used for crossover and mutation can come from any subpopulation, genetic material can spread accross the subpopulations. In this way, a ’cold’ subpopulation can reach improved locations in the search space. Moreover, while the highly explorative ’hot’ subpopulations would normally be likely to lose high quality individuals, the cold subpopulations function as an archive that allows them to rediscover such individuals. It is found that by this system of interacting subpopulations, the desirable features of exploration and stability can be successfully combined.

2

Coevolution: the delphi Algorithm

The delphi algorithm distinguishes between two types of individuals: learners are candidate solutions whose performance is to be optimized, while the evaluators are tests that are used to evaluate the learners. The delphi algorithm uses one population of learners and one population of evaluators. Following the recent approach in coevolution known as Pareto-Coevolution [10, 17], learners are evaluated by viewing their outcomes against the tests in the test population as objectives in the sense of Evolutionary Multi-Objective Optimization (EMOO) [9]. Evaluators aim to detect differences between learners so as to allow for informed decisions in learner replacement. An evaluator is said to distinguish between two learners if it assigns a higher outcome to one learner than to the other. The notion of distinctions was introduced by Sevan Ficici in [11]. We will use a symmetric test problem with three possible outcomes: win (1), draw (0), or lose (-1). In this problem, the outcome of an interaction G(a, e) between a learner a and an evaluator e returns 1 if and only if the outcome is a win for the learner. Since we are using the experiments as a model of practical problems, it should be sufficiently difficult for evaluators to make distinctions. Therefore, we will only say that an evaluator distinguishes between two learners in the case of win/lose outcomes. Thus, whether an evaluator e makes a distinction between learners a and b is defined as follows: dist(e, a, b)

⇐⇒

G(a, e) = 1 ∧ G(b, e) 6= 1

(1)

Evaluators are evaluated by using all potential distinctions between two learners in the learner population as objectives. Since a learner cannot be distinguished from itself, this results in a set of n2l − nl objectives for a population of nl learners. In addition, the nl objectives of obtaining a high score against a learner are used, resulting in a total of n2l objectives. Evaluation is based on the above choice of objectives for learners and evaluators, and employs the Pareto-dominance relation, see e.g. [9]. The delphi algorithm operates as follows. The learner and evaluator populations are initialized using random individuals. Next, the following cycle is repeated until a target performance criterion is met. A generation of learners is generated using the available operators of variation, typically mutation and crossover. For each new learner, we consider in turn whether it dominates an existing learner; if so it replaces it, if not it is discarded. Next, the same procedure is applied to the evaluator generation, except that a newly generated evaluator can only replace its parent if it dominates it; this additional requirement promotes diversity, and is based on Mahfoud’s Deterministic Crowding procedure [14].

3

Exploration versus Reliability: an Insurmountable Tradeoff?

The strict replacement criterion of Pareto-dominance allows the delphi algorithm to achieve reliable progress in difficult test problems, such as the five-dimensional

compare-on-one problem with mutation bias [7]. However, the delphi algorithm can only make progress if, given a current population, a better individual can be generated within a single step of mutation or crossover. If the fitness landscape of a problem contains local optima given the available operators, identifying a better individual may require multiple steps of mutation or crossover that do not appear beneficial. To study how coevolutionary methods can be developed that combine reliability with exploration, we define a test problem that requires exploration in order to be solved. The test problem is based on the compare-symmetric function, which is defined as follows:  ∀i : ai ≥ ei ∧ 6 ∀i : ai = ei  1 if ∀i : ei ≥ ai ∧ 6 ∀i : ai = ei Gsym (a, e) = −1 if (2)  0 otherwise where a is a learner, e is an evaluator, both learners and evaluators are real-valued vectors, and xi denotes the value of individual x in dimension i. In its basic form, this problem does not require exploration, assuming for instance the standard mutation operator as the operator of variation. We can introduce a need for exploration by discretizing the values of each individual before the outcome of the interaction function is determined; this results in plateaus in genotype-space on which all individuals have equal outcomes, so that progress can only be detected when the next plateau is reached. If the discretization interval is larger than the mutation range, then from certain regions in the space multiple steps will be required to arrive at an improved location, as desired. Discretization is performed by applying the following function to each value of the individual: d(x) = δ⌊ xδ ⌋. To test the effectiveness of this procedure in making the compare-symmetric difficult for methods with limited exploration, we study the performance of the delphi algorithm on both the normal and the discretized version of the comparesymmetric problem. Individuals are initialized by choosing each value randomly from [0, 0.125]. New individuals are generated using two-point crossover (50%) or mutation (50%). Mutation randomly chooses a dimension and adds a constant chosen uniformly from [-0.05,0.2], and is applied twice when used; this ensures that an increase in one dimension is often accompanied by a decrease in another dimension, which substantially complicates the problem of detecting progress based on interaction outcomes. Again, this choice is made to model difficulties faced by coevolutionary algorithms in practice. The experiments use the 3-dimensional compare-symmetric problem. For the discretized version of this problem, δ = 0.25. Both the learner and the evaluator population are of size 50. The curves show the best individual in the population. Score are determined by the lowest dimension of an individual, and averaged over 50 runs. Figure 1 (left) shows that while delphi performs well on the continuous version of compare-symmetric, it fails entirely when the problem is discretized. This confirms that the discretization has the desired effect of making the problem unsolvable for methods that only performs a single step look-ahead. In the following, we will investigate whether methods can be developed that can address the discretized version of the problem. To this end, we consider how the aims of

0.8 Continuous Discretized

0.35

Interaction No interaction 0.7

0.3 0.6

0.25

0.5

0.4

0.2

0.3 0.15 0.2 0.1 0.1

0.05

0 0

50000

100000

150000

200000

250000

300000

350000

400000

450000

500000

0

200000

400000

600000

800000

1e+06

Figure 1: Left: Performance of Delphi on the compare-symmetric problem. Since exploration is required for the discretized problem, Delphi is unable to progress. Right: Performance of the setups with multiple subpopulations with varying degrees of exploration. While the non-interacting subpopulations already achieve some progress on the discretized problem, the interacting version greatly improves over this performance. reliability and exploration may be combined in a single method.

4

Combining Exploration and Reliability

We face the problem of deciding whether to accept or reject a new individual, based on the objective values of the two individuals. A reliable choice that avoids regress is to accept the new individual only if it dominates the current individual. The other extreme of a maximally explorative choice is to always accept the replacement. Selection strategies in between these two extremes can be characterized by the degree to which they perform exploration. Using the Boltzmann distribution, we can define a selection strategy with a tunable degree of exploration, so that selection methods from anywhere in the spectrum of explorativeness can be automatically produced. The degree of exploration is regulated by a temperature parameter T . Specifically, an individual with v T . value v will be accepted into the population with probability vemax e T To distinguish between dominating, non-dominated, and dominated individuals, these classes are assigned values of 100, 40-60, and 10 respectively. The value of a non-dominated individual is determined by the fraction of objectives for which has a high value. We will investigate a setup consisting of five matched pairs of learner and evaluator subpopulations, where each learner subpopulation is evaluated as before on its corresponding evaluator subpopulation, vice versa. Subpopulation pairs with a varying degree of exploration can now easily be generated by choosing a different temperature for each subpopulation pair. The use of a range of different temperatures makes it likely that some temperature will provide the ’right’ degree of exploration. However, there is no guarantee

1

1 T=10 T=20 T=40 T=80 T=160

T=10 T=20 T=40 T=80 T=160

0.5

0.5

0

0

-0.5

-0.5

-1

-1 0

200000

400000

600000

800000

1e+06

1.2e+06

0

200000

400000

600000

800000

1e+06

1.2e+06

Figure 2: Performance per subpopulation (first run). As the difference between the graphs shows, the use of interaction greatly enhances performance; explorative subpopulations are able to generate improvements, and can avoid regress by using the stricter subpopulations as an archive. The stricter subpopulations provide reliability, but are also able to progress by importing improvements from explorative subpopulations. In this manner, the desirable aims of combination of exploration and reliability are combined. that any single degree of exploration addresses the problem of balancing exploration and reliability. The central idea of this paper is therefore to let the subpopulations interact; if explorative subpopulations can identify improvements and these improved individuals can migrate to more reliable, less explorative subpopulations, the two aims of exploration and reliability may be combined. To achieve this, we let each subpopulation consider all other subpopulations to choose the parents used to produce offspring. Figure 1 (right) shows the results of using non-interacting subpopulations. The curves show the best minimum value of an individual over all subpopulations. While delphi was seen to stall for this problem due to its lack of exploration, the use of varying degrees of exploration already results in some progress. However, as noted, the use of independent subpopulations does not guarantee that exploration and stability will be combined. The second curve in Figure 1 presents results with the proposed scheme for combining exploration and reliability. As the graph demonstrates, this method greatly improves performance, and achieves substantial progress on the difficult discretized compare-symmetric problem. To see whether the proposed method can indeed combine exploration and reliability, we also plotted the performance of each subpopulation separately, see Figure 2. This figure clearly shows the operation of the proposed method, and clarifies how it can combine exploration and stability. In the non-interacting setup, highly explorative subpopulations can achieve arbitrarily bad performance by accepting too many detrimental replacements. The strict, less explorative subpopulations for this method are not able to improve very much, as they are unable to generate new, improved individuals. In the setup with interaction, the subpopulations behave very differently, even though they have exactly the same temperature values as in the first setup. The strict subpopulations are able to progress every

once in a while, apparently through the use of genetic material from other subpopulations, as this is the only difference with the control setup. They thereby function as a form of archives that accept improvements when they are made, but do not produce improvements themselves. Interestingly, the behavior of the highly explorative subpopulations also changes; whereas formerly these were likely to regress, they can now use material from the more strict subpopulations to undo such regress. In summary, the interaction between subpopulations with different degrees of exploration makes it possible to obtain the desirable combination of exploration and reliability in coevolutionary algorithms.

5

Conclusions

Recently, theoretical advances in coevolution have suggested criteria for the evaluation of the tests used to evaluate learners. A first algorithm based on these ideas, named delphi, is able to achieve progress on all underlying objectives of challenging test problems, but is limited in that it does not perform exploration. We have investigated how exploration and reliability may be combined. It was found that a setup with interacting subpopulations with varying degrees of exploration is able to achieve this desirable goal. We hope this result may bring us closer to the goal of theoretically informed, practical algorithms for coevolution.

References [1] Robert Axelrod. The evolution of strategies in the iterated prisoner’s dilemma. In Lawrence Davis, editor, Genetic Algorithms and Simulated Annealing, Research Notes in Artificial Intelligence, pages 32–41, London, 1987. Pitman Publishing. [2] Nils Aall Barricelli. Numerical testing of evolution theories. Part I: Theoretical introduction and basic tests. Acta Biotheoretica, 16(1–2):69–98, 1962. [3] Anthony Bucci and Jordan B. Pollack. Order-theoretic analysis of coevolution problems: Coevolutionary statics. In Proceedings of the GECCO-02 Workshop on Coevolution: Understanding Coevolution, 2002. [4] Anthony Bucci and Jordan B. Pollack. A mathematical framework for the study of coevolution. In Foundations of Genetic Algorithms (FOGA-2002), (To appear). [5] D. Cliff and G. F. Miller. Tracking the Red Queen: Measurements of adaptive progress in co-evolutionary simulations. In F. Mor´ an, A. Moreno, J. J. Merelo, and P. Chac´ on, editors, Proceedings of the Third European Conference on Artificial Life: Advances in Artificial Life, volume 929 of LNAI, pages 200– 218, Berlin, 1995. Springer. [6] Edwin D. De Jong and Jordan B. Pollack. Principled Evaluation in Coevolution. Technical Report CS-02-225, Brandeis University, May 31, 2002.

[7] Edwin D. De Jong and Jordan B. Pollack. Learning the ideal evaluation function. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-03, 2003. [8] Edwin D. De Jong and Jordan B. Pollack. Ideal evaluation from coevolution. Evolutionary Computation, (To appear). [9] Kalyanmoy Deb. Multi-Objective Optimization Using Evolutionary Algorithms. Wiley & Sons, New York, NY, 2001. [10] Sevan G. Ficici and Jordan B. Pollack. A game-theoretic approach to the simple coevolutionary algorithm. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. Julian Merelo, and H.-P. Schwefel, editors, Parallel Problem Solving from Nature, PPSN-VI, volume 1917 of LNCS, Berlin, 2000. Springer. [11] Sevan G. Ficici and Jordan B. Pollack. Pareto optimality in coevolutionary learning. In Jozef Kelemen, editor, Sixth European Conference on Artificial Life, Berlin, 2001. Springer. [12] Charles J. Geyer and Elizabeth A. Thompson. Annealing Markov chain Monte Carlo with applications to ancestral inference. Journal of the American Statistical Association, 90(431):909–920, 1995. [13] D. W. Hillis. Co-evolving parasites improve simulated evolution in an optimization procedure. Physica D, 42:228–234, 1990. [14] Samir W. Mahfoud. Niching Methods for Genetic Algorithms. PhD thesis, University of Illinois at Urbana-Champaign, Urbana, IL, May 1995. IlliGAL Report 95001. [15] Jordan B. Pollack and Alan D. Blair. Co-evolution in the successful learning of backgammon strategy. Machine Learning, 32(1):225–240, 1998. [16] Karl Sims. Evolving 3D morphology and behavior by competition. In R. Brooks and P. Maes, editors, Artificial Life IV, pages 28–39, Cambridge, MA, 1994. The MIT Press. [17] Richard A. Watson and Jordan B. Pollack. Symbiotic combination as an alternative to sexual recombination in genetic algorithms. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton, J. Julian Merelo, and H.-P. Schwefel, editors, Parallel Problem Solving from Nature, PPSN-VI, volume 1917 of LNCS, Berlin, 2000. Springer. [18] Richard A. Watson and Jordan B. Pollack. Coevolutionary dynamics in a minimal substrate. In L. Spector, E. Goodman, A. Wu, W.B. Langdon, H.-M. Voigt, M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M. Garzon, and E. Burke, editors, Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-01, pages 702–709, San Francisco, CA, 2001. Morgan Kaufmann.

Suggest Documents