multiple offspring sampling (mos) - Core

DEPARTAMENTO DE ARQUITECTURA Y TECNOLOGÍA DE SISTEMAS INFORMÁTICOS

Facultad de Informática Universidad Politécnica de Madrid

PhD THESIS

A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)

Author Antonio LaTorre de la Fuente MS Computer Science MS Distributed Systems

PhD Director José María Peña Sánchez PhD Computer Science

2009

Thesis Committee

Chairman: Pedro de Miguel

Member: Francisco Herrera

Member: Alexander Mendiburu

External Member: Guillaume Beslon

Secretary: Víctor Robles

¡Qué sed de saber cuánto! ¡Qué hambre de saber cuántas estrellas tiene el cielo! Pablo Neruda

When you set out on your journey to Ithaca, pray that the road is long, full of adventure, full of knowledge. ... Always keep Ithaca in your mind. To arrive there is your ultimate goal. But do not hurry the voyage at all. It is better to let it last for many years; and to anchor at the island when you are old, rich with all you have gained on the way, not expecting that Ithaca will offer you riches. Ithaca has given you the beautiful voyage. Without her you would have never set out on the road. She has nothing more to give you. And if you find her poor, Ithaca has not deceived you. Wise as you have become, with so much experience, you must already have understood what Ithacas mean. Constantinos P. Kavafis

Acknowledgments It is always difficult to begin an acknowledge section, especially if it should cover a relatively wide and intense period of your life. Thinking as an engineer, the process could be divided into three main steps: • First, those important people you want to thank must be selected. This is a crucial step because, if you forget somebody, you will never forgive yourself. I am quite frightened at the moment I am writing these lines. . . • Second, some arbitrary order should be chosen to carry out this noble task. Again, the feeling that you are doing it in the wrong way appears and it will not leave you until the document is already printed and you can not do anything to repair the damage. • Finally, for all of us not used to express our feelings in such an explicit and public manner it is really difficult to write a few lines opening our hearts and exposing ourselves to the shame of being too “mushy” or not being able to express what we exactly meant. This is even harder if you have to do it in a different language than your mother tongue. With that being said, I will try to do my best and I hope I will not make any big mistake I should regret. If that happens, I am sure you will understand and forgive this blunder. The first person I would like to thank is my supervisor and even though friend Chema. I know you did not enjoy your year as Vice Dean for International Affairs but, without it, we would have never met and I would not be here at this moment (I would probably be living at Paris, working for an important IT Company and earning much more money than a FPU grant can provide. . . Who wants that life?). More seriously, thanks for all your time, comprehension, confidence and good moments we have shared together. It is difficult to find somebody that trusts you unconditionally and that is always willing to help you, even if that means decreasing his already quite reduced spare time. For all of this, and for all the upcoming moments, thanks. In the second place, I would like to thank everybody in the Department and the Lab, both professors and students, for all your help, suggestions, philosophical discussion when nobody wanted to come back to work, good and bad moments. I am sure these years would have not been as satisfactory as they have been anywhere else. I would also like to thank Professors Alex A. Freitas and El-Ghazali Talbi for giving me the opportunity of visiting their groups and meeting many brilliant people, especially Professor François Clautiaux, who helped me a lot during my stay at Lille. I do not forget, of course, of my family and friends. Thanks for your support and for bearing with me. I know I can be difficult (sometimes), but this is one of my charms, is not it?

Last but not least, this would have been not possible without your unconditional support and love. I know I do not deserve it, but I thank God every morning for giving me one more day by your side. You are the light in my (frequent) cloudy days, my eyes when I am so blind that I can not see and my smile when everything seems to go wrong. Thanks for believing in me. Thanks for loving me. Finally, I would like to thank everybody else who may have not felt included in any of the former acknowledgments. As I said before, I hope you will understand how difficult is to write these lines. Thanks again and. . . That’s all folks!

Antonio LaTorre de la Fuente October 14, 2009

Abstract Evolutionary Algorithms (EAs) are a set of optimization techniques that have become incredibly popular in the last decades. As they are general purpose algorithms, they have been applied to a wide range of problems, many of them from industrial or scientific disciplines. Several approaches have been proposed, each of them implementing the biological metaphor in their own particular way. This provides each of these evolutionary approaches with different search characteristics, which make them more suitable to different types of problems. This diversity of Evolutionary Algorithms makes possible to face a wider range of optimization problems. However, the selection of a particular Evolutionary Algorithm becomes a crucial decision that can determine the quality of the obtained results. Furthermore, some studies show that synergies among different Evolutionary Algorithms are possible when they are combined appropriately. The use of hybrid algorithms to deal with specific and complex real-world problems is also a fact that proves that hybridization is a powerful tool far beyond the individual algorithms. In this work, the combination of different evolutionary approaches is analyzed thanks to a framework that provides a robust and complete support for the development of Hybrid EAs. This framework is called Multiple Offspring Sampling (MOS) and is based on the key concept of a reproductive technique, which offers an abstraction of the mechanisms used by the different evolutionary approaches to create new individuals, i.e., the particular operators, parameters and encodings of the solutions present in the canonical versions of these algorithms. However, it is now the MOS framework, and not the individual algorithms, the one responsible for creating new individuals by means of the available reproductive techniques. The hybrid algorithms developed with the MOS framework can dynamically evaluate the performance of the different reproductive techniques and adjust their participation accordingly. Several strategies have been proposed for the evaluation of the quality of the techniques and the adjustment of their participation, including some of the more classic alternatives present in the literature to assess the convenience of using the mechanisms offered by MOS. Additionally, the automatic learning of these hybridization strategies by means of Reinforcement Learning mechanisms has also been studied. To conclude, the proposed framework has been tested on a set of well-known problems, from both discrete and continuous domains, obtaining statistically meaningful results confirming that an appropriate combination of different search strategies can lead to an outstading performance compared to the individual algorithms. Keywords: Multiple Offspring Sampling, Hybrid Evolutionary Algorithms, Optimization Problems, Reinforcement Learning.

Resumen Los Algoritmos Evolutivos (AEs) son un conjunto de técnicas de optimización que han recibido una gran atención en las últimas décadas. Al tratarse de algoritmos de propósito general, han sido aplicados a problemas de muy diversa índole, muchos de los cuales se encuadran en el contexto de aplicaciones científicas e industriales. Se han propuesto distintas aproximaciones, cada una con su particular forma de implementar la metáfora biológica. De esta manera, cada una de estas alternativas presenta sus propias características de búsqueda, lo cual determina en cierta manera cómo de bueno será un tipo de algoritmo para un problema concreto. Esta diversidad de AEs permite poder resolver un mayor número de problemas de optimización complejos. Sin embargo, la selección de un AE concreto se ha convertido en una decisión de crucial importancia que puede condicionar la calidad de los resultados obtenidos. Además, algunos estudios han constatado que pueden existir sinergias entre distintos AEs si éstos son correctamente combinados. Por otro lado, el que tradicionalmente se hayan usado algoritmos híbridos en la resolución de problemas reales es una prueba más de que la hibridación de algoritmos es una poderosa herramienta que supera, en muchos casos, a los algoritmos tradicionales. En este trabajo se analizará la combinación de distintas técnicas evolutivas gracias a un robusto y completo framework que proporciona las herramientas necesarias para el desarrollo de Algoritmos Evolutivos Híbridos. Este framework recibe el nombre de Multiple Offspring Sampling (MOS) y está basado en el concepto clave de técnica reproductiva, que ofrece la abstracción necesaria de los mecanismos utilizados por cada una de las distintas técnicas evolutivas para crear nueva descendencia, es decir, los operadores, parámetros y codificación de las soluciones presentes en las versiones canónicas de dichos algoritmos. Sin embargo, ahora es el framework MOS, y no los algoritmos individuales, el responsable de crear nuevos individuos por medio de las técnicas reproductivas disponibles. Los algoritmos híbridos desarrollados con el framework MOS pueden evaluar dinámicamente el comportamiento de las distintas técnicas y ajustar su participación en el proceso de búsqueda en función de dicho comportamiento. Se han propuesto distintas estrategias tanto para la evaluación de la calidad de una técnica como para el ajuste de su participación, entre las que se incluyen algunas estrategias clásicas en la literatura, y que han sido tenidas en cuenta como referencia para evaluar la conveniencia de usar los mecanismos propios de MOS. Como complemento a lo anterior, se ha estudiado la posibilidad de aprender automáticamente estrategias óptimas de hibridación mediante técnicas de Aprendizaje por Refuerzo. Para concluir, el framework y la metodología propuestos han sido evaluados en un conjunto representativo de problemas, tanto discretos como continuos, y los resultados obtenidos han sido estadísticamente significativos, lo cual confirma la hipótesis de que una combinación adecuada de distintos Algoritmos Evolutivos puede obtener unos resultados mucho mejores que los obtenidos por los algoritmos individualmente. Palabras Clave: Multiple Offspring Sampling, Algoritmos Evolutivos Híbridos, Problemas de Optimización, Aprendizaje por Refuerzo.

Declaration I declare that this PhD Thesis was composed by myself and that the work contained therein is my own, except where explicitly stated otherwise in the text.

(Antonio LaTorre de la Fuente)

Table of Contents

Table of Contents

i

List of Figures

vii

List of Tables

ix

List of Algorithms

xi

Acronyms and Definitions

I

INTRODUCTION

Chapter 1 1.1

1.2

1.3

II

xiii

1

Introduction

3

Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.1.1

Different Evolutionary Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.1.2

Difficult Selection of Appropriate Evolutionary Techniques for a Given Problem . . .

5

1.1.3

Synergies among Evolutionary Approaches . . . . . . . . . . . . . . . . . . . . . . .

5

1.1.4

Lack of a General Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.2.1

Formalization of a General Methodology . . . . . . . . . . . . . . . . . . . . . . . .

6

1.2.2

Evaluation of the Regulatory Mechanisms . . . . . . . . . . . . . . . . . . . . . . . .

6

1.2.3

Application to Complex Optimization Problems . . . . . . . . . . . . . . . . . . . . .

7

1.2.4

Learning Optimal Hybridization Strategies . . . . . . . . . . . . . . . . . . . . . . .

7

Document Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

STATE OF THE ART

Chapter 2

9

Evolutionary Computation

11

2.1

An Overview of Evolutionary Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2.2

Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

i

2.3

2.4

2.5

2.6

2.2.1

Evolutionary Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.2.2

Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.2.2.1

Selection Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.2.2.2

Crossover Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.2.2.3

Mutation Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.2.3

Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.2.4

Genetic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

Estimation of Distribution Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.3.1

Learning Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.3.1.1

No Interdependencies Model . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.3.1.2

Bivariate Dependencies Model . . . . . . . . . . . . . . . . . . . . . . . .

24

2.3.1.3

Multivariate Dependencies Model . . . . . . . . . . . . . . . . . . . . . . .

25

Swarm Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

2.4.1

Ant Colony Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

2.4.2

Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

Other Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.5.1

Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.5.1.1

Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.5.1.2

Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

2.5.1.3

Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.5.1.4

Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Chapter 3 3.1

3.2

Adaptation and Hybridization in Evolutionary Computation

35

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.1.1

What Can Be Adapted/Hybridized? . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

3.1.2

How Adaptation/Hybridization Can Take Place? . . . . . . . . . . . . . . . . . . . .

37

Previous Work on Adaptive and Hybrid Evolutionary Algorithms . . . . . . . . . . . . . . . .

40

3.2.1

Hybrid Algorithms with Relay Behavior . . . . . . . . . . . . . . . . . . . . . . . . .

40

3.2.2

Hybrid Algorithms with Teamwork Behavior . . . . . . . . . . . . . . . . . . . . . .

40

3.2.2.1

Hybrid Algorithms with Collaborative Behavior . . . . . . . . . . . . . . .

40

3.2.2.2

Hybrid Algorithms with Competitive and Adaptive Behavior . . . . . . . .

41

3.2.2.3

Hybrid Algorithms with Competitive and Self-Adaptive Behavior . . . . . .

41

3.2.2.4

Hybrid Algorithms with a Shared Population . . . . . . . . . . . . . . . . .

42

3.2.2.5

Hybrid Algorithms with Private Populations . . . . . . . . . . . . . . . . .

42

3.2.2.6

Heterogeneous Hybrid Algorithms with Different Algorithms . . . . . . . .

42

3.2.2.7

Heterogeneous Hybrid Algorithms with Different Operators . . . . . . . . .

43

3.2.2.8 3.3

III

Heterogeneous Hybrid Algorithms with Different Encodings . . . . . . . .

43

Limitations of Previous Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

PROBLEM STATEMENT AND SOLUTION

Chapter 4 4.1

4.2

Multiple Offspring Sampling

49

Introduction to Multiple Offspring Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

4.1.1

Functional Formalization of an Evolutionary Algorithm . . . . . . . . . . . . . . . .

50

4.1.1.1

Genotypes and Phenotypes . . . . . . . . . . . . . . . . . . . . . . . . . .

51

4.1.1.2

Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

4.1.1.3

Offspring Sampling Function . . . . . . . . . . . . . . . . . . . . . . . . .

52

4.1.2

Description of Multiple Offspring Sampling . . . . . . . . . . . . . . . . . . . . . . .

53

4.1.3

Multiple Encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

The Multiple Offspring Sampling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

4.2.1

Central Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.2.1.1

Participation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

4.2.1.2

Quality Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

Self-Adaptive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

Overview of the hybridization capabilities of Multiple Offspring Sampling . . . . . . . . . . .

70

4.2.2 4.3

47

Chapter 5

Application to Permutation Problems

73

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

5.2

Supercomputer Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

74

5.2.1

State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

5.2.1.1

Flow-Shop Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . .

76

5.2.1.2

Job-Shop Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . .

76

5.2.1.3

Multiprocessor Scheduling Problem . . . . . . . . . . . . . . . . . . . . .

76

5.2.1.4

Other Packing and Knapsack Problems . . . . . . . . . . . . . . . . . . . .

77

5.2.2

Definition of the Supercomputer Scheduling Problem . . . . . . . . . . . . . . . . . .

77

5.2.3

Related Work on Cluster and Supercomputer Scheduling . . . . . . . . . . . . . . . .

78

5.2.3.1

Non-Combinatorial Policies . . . . . . . . . . . . . . . . . . . . . . . . . .

79

5.2.3.2

Scheduling Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

5.2.4.1

Evolutionary Techniques for Supercomputer Scheduling . . . . . . . . . . .

81

5.2.4.2

First Experimental Scenario . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5.2.4.3

Results and Discussion of the First Experiment . . . . . . . . . . . . . . . .

83

5.2.4.4

Second Experimental Scenario . . . . . . . . . . . . . . . . . . . . . . . .

84

5.2.4

5.2.4.5

Results and Discussion of the Second Experiment . . . . . . . . . . . . . .

85

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

The Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

5.3.1

State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

5.3.1.1

Non-Evolutionary Approaches . . . . . . . . . . . . . . . . . . . . . . . .

86

5.3.1.2

Evolutionary Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

5.3.1.3

Memetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

Experimentation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

5.3.2.1

Datasets and Execution Parameters . . . . . . . . . . . . . . . . . . . . . .

90

5.3.2.2

Experiment 1: Exhaustive Approach . . . . . . . . . . . . . . . . . . . . .

91

5.3.2.3

Experiment 2: Greedy Approach . . . . . . . . . . . . . . . . . . . . . . .

94

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

5.2.5 5.3

5.3.2

5.3.3 Chapter 6

Application to Continuous Problems

97

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

6.2

CEC 2005 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

6.2.1

Description of the CEC 2005 Benchmark . . . . . . . . . . . . . . . . . . . . . . . .

98

6.2.1.1

Unimodal Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

6.2.1.2

Basic Multimodal Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.2.1.3

Expanded Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.2.1.4

Composition Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.2.2

Experimentation in the CEC 2005 Benchmark . . . . . . . . . . . . . . . . . . . . . . 108

6.2.3

Results in the CEC 2005 Benchmark 6.2.3.1

6.3

Analysis of the Participation Adjustment . . . . . . . . . . . . . . . . . . . 113

CEC 2008 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.1

6.4

. . . . . . . . . . . . . . . . . . . . . . . . . . 110

Description of the CEC 2008 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.1.1

Shifted Sphere Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6.3.1.2

Schwefel’s Problem 2.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

6.3.1.3

Shifted Rosenbrock’s Function . . . . . . . . . . . . . . . . . . . . . . . . 121

6.3.1.4

Shifted Rastrigin’s Function . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.3.1.5

Shifted Griewank’s Function . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.3.1.6

Shifted Ackley’s Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

6.3.2

Experimentation in the CEC 2008 Benchmark . . . . . . . . . . . . . . . . . . . . . . 123

6.3.3

Results in the CEC 2008 Benchmark

. . . . . . . . . . . . . . . . . . . . . . . . . . 125

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Chapter 7 7.1

7.2

129

Behavioral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.1.1

Results in Rastrigin’s Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.1.2

Results in Griewank’s Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

7.1.3

Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Computational Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Chapter 8

Learning Hybridization Strategies

139

8.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

8.2

Related Work on Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

8.3

Reinforcement Learning to Control MOS Strategies . . . . . . . . . . . . . . . . . . . . . . . 142

8.4

IV

Behavioral and Computational Analysis

8.3.1

PHC Learning Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

8.3.2

WoLF Learning Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

8.3.3

TERSQ Learning Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Experimental Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 8.4.1

Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

8.4.2

Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

8.5

Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

8.6

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

CONCLUSIONS AND FUTURE WORK

Chapter 9

Conclusions

159 161

9.1

General Methodology for the Combination of Evolutionary Algorithms . . . . . . . . . . . . 161

9.2

Application to Complex Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . 162

9.3

Central vs. Self-Adaptive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

9.4

Learning Optimal Hybrid Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

9.5

Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

9.6

Selected Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Chapter 10 Future Work

167

10.1 Variable Sets of Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 10.2 Combination with Non-Evolutionary Techniques . . . . . . . . . . . . . . . . . . . . . . . . 168 10.3 Implementation of Restart Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 10.4 Post-Execution Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.5 New Quality Measures and Participation Functions . . . . . . . . . . . . . . . . . . . . . . . 169 10.6 Other Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

V

APPENDICES

Appendix A Experimental and Validation Procedures

171 173

A.1 General Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 A.2 nWins Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 A.3 Holm Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Appendix B Complete Results

177

B.1 Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 B.2 CEC 2005 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 B.3 CEC 2008 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Bibliography

183

List of Figures

1.1

Main objectives of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.1

Probabilities for individuals according to the Roulette Wheel Selector . . . . . . . . . . . . .

16

2.2

Roulette Wheel vs. Rank Selector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2.3

One Point Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.4

Two Points Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.5

Uniform Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2.6

Arithmetic Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.7

Cycle Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.8

Ordered Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.9

BLX−α Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.10 Simple Inversion Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.11 Repeated Exchange Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.12 Uniform Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

2.13 Example of a graphical model for x = (A, B, C, D) . . . . . . . . . . . . . . . . . . . . . . .

23

2.14 Example for the social behavior of ants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.15 Binomial Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.16 Exponential Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

2.17 Differential Evolution Algorithm (I) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2.18 Differential Evolution Algorithm (II) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

3.1

Taxonomy of parameter setting proposed by Eiben . . . . . . . . . . . . . . . . . . . . . . .

38

3.2

Taxonomy of hybrid algorithms. Boxed classes are additions to Talbi’s taxonomy . . . . . . .

39

4.1

General schema of a MOS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.2

General schema of a MOS algorithm with Central approach . . . . . . . . . . . . . . . . . . .

58

4.3

General schema of a MOS algorithm with Self-Adaptive approach . . . . . . . . . . . . . . .

70

4.4

Overview of the hybridization capabilities of MOS . . . . . . . . . . . . . . . . . . . . . . .

71

5.1

Scheduler description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

vii

5.2

Conversion from ordinal (real) encoding to path (integer) encoding . . . . . . . . . . . . . . .

91

5.3

Conversion from path (integer) encoding to ordinal (real) encoding . . . . . . . . . . . . . . .

91

6.1

3-D plots of the Sphere function and Schwefel’s problem 1.2 . . . . . . . . . . . . . . . . . . 100

6.2

3-D plots of the High Conditioned Elliptic function and Schwefel’s problem 1.2 with noise . . 101

6.3

3-D plot of Schwefel’s problem 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.4

3-D plots of Rosenbrock’s and Griewank’s functions . . . . . . . . . . . . . . . . . . . . . . 102

6.5

3-D plots of Ackley’s and Rastrigin’s functions . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.6

3-D plots of Rastrigin’s and Weierstrass functions . . . . . . . . . . . . . . . . . . . . . . . . 104

6.7

3-D plots of Schwefel’s problem 2.13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.8

3-D plots of F13 and F14 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.9

3-D plots of F15 and F16 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

6.10 3-D plots of F17 and F18 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.11 3-D plots of F19 and F20 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.12 3-D plots of F21 and F22 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.13 3-D plots of F23 and F24 functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.14 Participation adjustment of the six hybrid algorithms in the F4 function with 10 dimensions . . 114 6.15 Participation adjustment of the six hybrid algorithms in the F10 function with 10 dimensions . 115 6.16 Participation adjustment of the six hybrid algorithms in the F7 function with 30 dimensions . . 116 6.17 Participation adjustment of the six hybrid algorithms in the F17 function with 30 dimensions . 117 6.18 3-D plots of the Shifted Sphere function and Schwefel’s problem 2.21 . . . . . . . . . . . . . 121 6.19 3-D plots of Shifted Rosenbrock’s and Rastrigin’s functions . . . . . . . . . . . . . . . . . . . 122 6.20 3-D plots of Shifted Griewank’s and Ackley’s functions . . . . . . . . . . . . . . . . . . . . . 123 7.1

Dynamic adjustment of the Participation for Rastrigin’s function and both Quality Measures . 132

7.2

Evolution of the fitness value for Griewank’s function and the Fitness Average Quality Measure 133

7.3

Dynamic adjustment of the Participation for Griewank’s function and both Quality Measures . 133

8.1

Example of matrix of states with two techniques . . . . . . . . . . . . . . . . . . . . . . . . . 143

8.2

Participation adjustment of the hybrid algorithms in the Sphere function . . . . . . . . . . . . 155

8.3

Participation adjustment of the hybrid algorithms in Rastrigin’s function . . . . . . . . . . . . 156

8.4

Participation adjustment of the hybrid algorithms in the Sphere function . . . . . . . . . . . . 157

8.5

Participation adjustment of the hybrid algorithms in Rastrigin’s function . . . . . . . . . . . . 158

List of Tables

2.1

Example of the Roulette Wheel Selector for a hypothetical population of five individuals . . .

16

2.2

Example of the Rank Selector for a population of four individuals . . . . . . . . . . . . . . .

17

3.1

Some of the parameters of EAs subject to adaptation and their effect on the behavior of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3.2

Summary of previous work on Hybrid Evolutionary Algorithms . . . . . . . . . . . . . . . .

46

5.1

Experimental scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

5.2

Summary of the results of the first experiment . . . . . . . . . . . . . . . . . . . . . . . . . .

83

5.3

Summary of the results of the second experiment . . . . . . . . . . . . . . . . . . . . . . . .

85

5.4

Crossover operators for the Path representation . . . . . . . . . . . . . . . . . . . . . . . . .

88

5.5

Experimental scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

5.6

Configuration of the five GA techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

5.7

Average number of wins compared with the number of techniques . . . . . . . . . . . . . . .

93

5.8

Average number of wins for each technique . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

5.9

Configuration of the six new GA techniques . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

5.10 Summary of the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

6.1

Configuration of the different evolutionary techniques used . . . . . . . . . . . . . . . . . . . 109

6.2

Common configuration for 10 and 30 dimensional functions. . . . . . . . . . . . . . . . . . . 110

6.3

Configuration of the hybrid algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.4

Results in the CEC2005 Benchmark for 10 dimensional functions. . . . . . . . . . . . . . . . 111

6.5

Results in the CEC2005 Benchmark for 30 dimensional functions. . . . . . . . . . . . . . . . 112

6.6

Comparative with the algorithms of the CEC 2005 Special Session for the 10 dimensional functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.7

Comparative with the algorithms of the CEC 2005 Special Session for the 30 dimensional functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.8

Configuration of the GA techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.9

Configuration of the DE and ES techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.10 Common configuration for the hybrid algorithms. . . . . . . . . . . . . . . . . . . . . . . . . 124 ix

6.11 Summary of the results in the CEC 2008 Benchmark with 1,000 dimensions . . . . . . . . . . 125 6.12 Average ranking and results of the nWins and the Holm Procedures in the CEC2008 Benchmark for 1,000 dimensional functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.1

Set of GA techniques for Rastrigin and Griewank’s functions . . . . . . . . . . . . . . . . . . 130

7.2

Common configuration for all the problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

7.3

Comparison of the results in Rastrigin’s function . . . . . . . . . . . . . . . . . . . . . . . . 131

7.4

Comparison of the results in Griewank’s function . . . . . . . . . . . . . . . . . . . . . . . . 132

7.5

Results of the Wilcox test with a significance level α = 0.05 . . . . . . . . . . . . . . . . . . 134

7.6

Average performance of the four individual algorithms . . . . . . . . . . . . . . . . . . . . . 134

7.7

Average performance of the hybrid algorithms compared with. Best and Worst individual algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

7.8

Detailed comparison of fAvg + Dyn PF1 configuration with Best and Worst individual algorithms136

7.9

Detailed comparison of NSC + Dyn PF2 configuration with Best and Worst individual algorithms137

8.1

Algorithm configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

8.2

Set of techniques for the hybrid evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . 148

8.3

Parameters of the RL policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

8.4

Average error in the six proposed functions when the four reproductive techniques are considered.150

8.5

Ranking and statistical test results for both the single and the hybrid algorithms for the first experimental configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

8.6

Average error in the six proposed functions when only BCUM and UCUM techniques are considered. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

8.7

Ranking and statistical test results for both the single and the hybrid algorithms for the second experimental configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

B.1 Results of the exhaustive experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 B.2 Full results in the CEC 2005 Benchmark in 10 dimensions . . . . . . . . . . . . . . . . . . . 179 B.3 Full results in the CEC 2005 Benchmark in 30 dimensions . . . . . . . . . . . . . . . . . . . 180 B.4 Full results in the CEC 2008 Benchmark in 1,000 dimensions . . . . . . . . . . . . . . . . . . 181

List of Algorithms

1

Classic Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2

Estimation of Distribution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

3

Differential Evolution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

4

MOS Algorithm with Central Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

5

MOS Algorithm with Self-Adaptive Approach . . . . . . . . . . . . . . . . . . . . . . . . . .

69

6

Greedy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

7

Multiple Offspring Sampling with RL Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 142

8

PHC Learning Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

9

WoLF Learning Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

10

TERSQ Learning Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

xi

Acronyms and Definitions

AC Arithmetic Crossover ACO Ant Colony Optimization AI Artificial Intelligence API Application Programming Interface BMDA Bivariate Marginal Distribution Algorithm BOA Bayesian Optimization Algorithm BSC Bit-Based Simulated Crossover CE Chain Exchanges CX Cycle Crossover CMA-ES Covariance Matrix Adaptation Evolution Strategy COMIT Combining Optimizers with Mutual Information Trees DAG Direct Acyclic Graph DE Differential Evolution EA Evolutionary Algorithm EASY Extensible Argonne Scheduling sYstem EBNA Estimation of Bayesian Networks Algorithm EC Evolutionary Computation EDA Estimation of Distribution Algorithm EE Edge Exchanges EGNA Estimation of Gaussian Networks Algorithm xiii

EMNA Estimation of Multivariate Normal Algorithm EP Evolutionary Programming EXX Edge Exchange Crossover FCFS First Come First Served FDC Fitness Distance Correlation FE Fitness Evaluation FSM Finite State Machine FSS Flow-Shop Scheduling FWER Family-Wise Error GA Genetic Algorithm GP Genetic Programming HPC High Performance Computing IGA Infinitesimal Gradient Ascent JSS Job-Shop Scheduling LJF Longest Job First LK Lin-Kernighan LS Local Search MA Memetic Algorithm ML Machine Learning MDP Markov Decision Process MIBOA Mixed-Integer Bayesian Optimization Algorithm MIMIC Mutual Information Maximization for Input Clustering MOS Multiple Offspring Sampling MPI Message Passing Interface MPS Multiprocessor Scheduling

NFL No Free Lunch NSC Negative Slope Coefficient FPNSC Fitness-Proportional Negative Slope Coefficient OPC One Point Crossover OX Ordered Crossover PBIL Population-Based Incremental Learning PF Participation Function PHC Policy Hill Climbing PMX Partially Matched Crossover PSO Particle Swarm Optimization PGM Probabilistic Graphical Model QAP Quadratic Assignment Problem QF Quality Function REM Repeated Exchange Mutation RL Reinforcement Learning SA Simulated Annealing SIM Simple Inversion Mutation SJF Shortest Job First SXX Subtour Exchange Crossover SCS Supercomputer Scheduling TERSQ Tentative Exploration by Restricted Stochastic Quota TPC Two Points Crossover TSP Traveling Salesman Problem UC Uniform Crossover UM Uniform Mutation

UMDA Univariate Marginal Distribution Algorithm VNS Variable Neighborhood Search VRP Vehicle Routing Problem WoLF Win or Learn Fast WPL Weighted Policy Learner

Part I

INTRODUCTION

Chapter 1

Introduction

In spite of the wide range of application fields and good results that Evolutionary Algorithms (EAs) have obtained in complex optimization problems, their results are not always as good as one could expect. Normally, most researchers test only a few different algorithms when trying to solve a particular optimization problem. Even if the most suitable algorithm for that problem has been selected, it is hard to find its best configuration (parameters and set of operators). Additionally, different algorithms work better in different problems. This is in accordance with the No Free Lunch (NFL) theorem [WM97], which states that for any algorithm with an outstanding performance in a given problem there is always another problem in which other algorithms perform better. Although the NFL theorem is based on certain extreme theoretical considerations [DJW02, Olt04], realworld problems also show differences in the comparative performance of several algorithms. Furthermore, some studies into Hybrid Evolutionary Algorithms [HWC00, LPRM08] show that with the combination of different search strategies it is possible to obtain better results compared to their individual performance. According to the studies conducted by Sinha and Goldberg [SG03], there are three main reasons for the hybridization of EAs: 1. An improvement in the performance of the EA (for example, the speed of convergence). 2. An improvement in the quality of the solutions obtained by the EA. 3. To incorporate the EA as a part of a larger system. This research will focus on the first two reasons for hybridization in Evolutionary Algorithms, improvement of performance of the EAs and quality of the obtained solutions, but especially on the last one. Antonio LaTorre de la Fuente


4

CHAPTER 1. INTRODUCTION

In this first chapter, the main motivations for this research will be enumerated (Section 1.1) and the objectives of this work will be presented (Section 1.2). Finally, Section 1.3 summarizes the organization and the contents of this document.

1.1

Motivations

When dealing with a new optimization problem, the same recurrent question always appears: which is the best way to solve it? To find an answer, we can make use of our knowledge in the field or, even better, of that of the scientific community, carrying out a bibliographic review to check out which approaches have been successfully applied to solve similar problems in the past. However, the number of publications all around the world is incredibly high, which makes very difficult to evaluate all the possible alternatives. Besides, many of these publications propose different algorithms and promise important improvements that, sometimes, are problem-specific or contradictory among them. Testing every different approach requires many resources and much time to be an acceptable option. If the impossibility to design an optimal algorithm for every possible problem stated by the aforementioned NFL theorem is also considered, it is clear that Hybrid Evolutionary Algorithms are a good option if competitive results with little effort are preferred rather than a specific solution with better performance but much more effort needed as well. Furthermore, in some cases hybrid approaches will obtain better results than the individual algorithms they are made up of, as we will see through this work. From all the previous considerations, the following four main motivations can be derived: 1. The existence of different evolutionary approaches that can be applied to solve the same problem and the ignorance of their a priori performance. 2. The difficulties in selecting the most suitable evolutionary approach for a particular problem and all the parameters needed by that model. 3. The possible existence of synergies among several evolutionary approaches when combined. 4. The lack of a general methodology for the combination of different EAs in an easy to use way. The next subsections develop each of the previous points to make clear the motivations for this research. Additionally, Figure 1.1 graphically depicts the objectives described in this section.

1.1.1

Different Evolutionary Techniques

The great development of techniques based on Evolutionary Computation (EC) provides an incredible range of optimization techniques that can be applied to a wide range of problems. However, not all the techniques have been applied to every problem and, even though some more or less complete benchmarks have been proposed, the comparison of Evolutionary Algorithms is still difficult in real-world scenarios. This makes A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)

Antonio LaTorre de la Fuente

1.1. MOTIVATIONS

5

very difficult to identify the most suitable technique (or techniques) for a particular problem. In some extreme cases, different researchers present contradictory results in the same problem that most of the times depend on a correct or wrong parametrization of the algorithms. For these reasons, a general formalization for the combination and adaptation of different evolutionary approaches could make easier the selection of an EA and make those resources available for the actually important task that is no other than solving the problem.

1.1.2

Difficult Selection of Appropriate Evolutionary Techniques for a Given Problem

We have seen that it is not easy to select a particular evolutionary model, its parameters and its operators (if applicable). Nevertheless, if there is no clear evidence of better suitability of a particular method, a preliminary experimentation should be carried out in order to find the algorithm that best fits our needs. Depending on the performance requirements of our application problem, the computational effort will be more or less considerable. In most cases, a significant number of combinations of parameters, operators and algorithms should be tested in order to be confident enough on the selection carried out for the subsequent experimentation. This approach is known as parameter tuning [EHM99] and implies a lot of computational effort. Some approaches have been proposed which try to find the most suitable combination of parameters while the algorithm is being run (parameter control methods). Although these mechanisms achieve, in general, satisfactory results and require less time and resources than a brute force approach, they have been usually only applied to a small set of parameters, such as the population size or the probabilities of the recombination operators, but not to the encoding of the solutions or the number of offspring mechanisms to be used simultaneously. In this work, a dynamic algorithm capable of adapting and finding the best combination of techniques and parameters is proposed and proved to be able to find good combinations of techniques and parameters for different complex optimization problems, some of them even better than the best individual algorithm.

1.1.3

Synergies among Evolutionary Approaches

Some approaches provide mechanisms for self-adaptation of their parameters [EHM99] but few for the combination of different evolutionary approaches (and always in very problem-specific way, which makes them hardly applicable to other problems). Hybrid approaches, with increased performance in comparison with their individual components, have also been proposed (see Section 3.2). However, they suffer from the same limitations than the adaptive algorithms described in [EHM99]: most of them are ad-hoc implementations for a specific problem or set of problems and limited to a reduced number of evolutionary approaches. The ability of the Multiple Offspring Sampling (MOS) framework proposed in this work to combine different evolutionary approaches makes possible the design of hybrid algorithms that can exploit the synergies arisen from the combination of different EAs. Antonio LaTorre de la Fuente


6


1.1.4

Lack of a General Methodology

Some researchers have previously proposed Hybrid Evolutionary Algorithms in the past. However, each algorithm was independently developed and constrained to both the application problem and the algorithms it is made up of. In this work, a general methodology for designing performant Hybrid EAs is proposed. The mechanisms for the creation of new offspring are abstracted from the core of the algorithm. Moreover, the regulatory mechanisms, Participation Functions (Section 4.2.1.1) and Quality Measures (Section 4.2.1.2), have been completely decoupled from the algorithm, which makes it easy to exchange and/or combine these mechanisms to adapt the algorithm to the particular needs of the application problem.

1.2

Objectives

The research work described in this PhD thesis is enclosed within the field of Evolutionary Computation. Its main objective is to propose a general methodology to simultaneously combine multiple evolutionary techniques, dynamically adjusting the participation of each of them in the overall process. This methodology will be tested on several optimization problems to assess its validity and analyze its behavior. Each of the objectives of this work will be reviewed in detail in the next sections.

1.2.1

Formalization of a General Methodology

As seen in Section 1.1.4, a general methodology for the combination of different Evolutionary Algorithms is needed in order to be able to design powerful Hybrid EAs capable of finding good solutions to many different optimization problems. For this purpose, a general framework for the combination of EAs has been proposed. This framework should be aware of the multiple possible mechanisms to create new offspring (the reproductive techniques), the different possible encodings for the same candidate solution and the specific parameters and operators of each of these reproductive techniques. The methodology should provide the appropriate regulatory mechanisms to evaluate the performance of each reproductive technique on each phase of the search procedure and to update the participation of each technique (the proportion of the offspring population that each technique is allowed to produce) accordingly. Besides, the mechanisms to combine the offspring populations created by each of the available reproductive techniques must be defined and managed by the hybrid algorithm.

1.2.2

Evaluation of the Regulatory Mechanisms

Several regulatory mechanisms should be tested and compared on the application problems. The adjustment of participation carried out by each of these mechanisms based on either (1) the different Quality Measures or (2) the self-adaptive implicit mechanism should be analyzed. Finally, the overall performance of each A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


1.2. OBJECTIVES

7

regulatory mechanism should be evaluated by means of the appropriate statistical procedures in order to validate the significance of the results.

1.2.3

Application to Complex Optimization Problems

The proposed methodology should be tested on a diverse set of problems of different nature in order to be able to validate it as a general approach for combining Evolutionary Algorithms. At this point, two different kinds of problems have been considered. On the first hand, two complex combinatorial problems such as the Supercomputer Scheduling Problem and the Traveling Salesman Problem have been selected. The first problem deals with the task of scheduling the jobs which should be dispatched by a cluster-like Supercomputer. Each job is made up of tasks, and each task has its own requirements of number of processors, memory and execution time. The goal is to find a schedule of the jobs minimizing the timespan of the system. The second one is a classic combinatorial problem where N cities must be traveled so that the total tour length is minimized. This problem is apparently easy although it has been actually proved to be NP-hard. On the other hand, two state-of-the-art benchmarks for continuous optimization have also been considered. Both benchmarks were proposed in two sessions of the IEEE Conference on Evolutionary Computation (CEC 2005 and CEC 2008, respectively). The first benchmark is made up of 25 scalable functions divided in four groups: unimodal, basic multimodal, expanded and composed functions. The complexity of the functions grows from the unimodal (easier) to the composed (more complex) groups of functions. The functions in this last group are quite hard to solve, as they are combinations of up to ten multimodal functions, which is translated into incredibly rugged fitness landscapes and massive multimodality. The second benchmark is made up of only six functions. However, this is also a hard benchmark as all the functions are scalable and the dimensionality proposed in the session was of 1, 000 dimensions. This extremely large number of dimensions make difficult for the algorithms to find solutions close to the global optimum and much attention should be paid to the balance between global and local search.

1.2.4

Learning Optimal Hybridization Strategies

The possibility of learning optimal strategies for combining several EAs is studied by the application of Machine Learning mechanisms, Q-learning in this case. In the context of a hybrid algorithm with dynamic adjustment of participation it would be interesting to analyze if it is possible to identify optimal patterns for the combination of algorithms, by means of Reinforcement Learning (RL) and the information gathered from multiple executions of the algorithm. This is especially interesting in environments where the Evolutionary Algorithm is executed multiple (possibly thousands) times, as the learning algorithm would have enough information for learning an optimal strategy. For example, some problems exhibit a similar behavior, and thus can be solved in a similar way, when different input data is considered (different instances of the TSP or the SAT Antonio LaTorre de la Fuente


8


problems).

Toolbox of Evolutionary Algorithms

Objectives Hybrid EAs Framework

Learning Strategies

Evaluation of Regulatory Mechanisms

Complex Optimization Problems

Figure 1.1: Main objectives of this work

1.3

Document Organization

To conclude this introduction, the structure of the document will be detailed. Chapter 2 briefly reviews the state of the art in Evolutionary Computation. In Chapter 3 the most relevant work dealing with adaptation and hybridization of EAs will be presented. Chapter 4 introduces the proposed methodology with the aim of becoming a general framework for the dynamic combination of Evolutionary Algorithms. This methodology is formalized, several strategies for the dynamic adjustment of the participation of the different EAs are proposed and a classic strategy for hybridization of EAs where the participation information is encoded within the individual is also described. Chapters 5 and 6 detail the experimentation conducted to test the approach devised in this work. Two types of problems are considered: combinatorial and continuous problems, respectively. Statistical tests are used to assess that significant differences actually exist between the proposed methodology and classic Evolutionary Algorithms. Chapter 8 presents the application of Reinforcement Learning mechanisms to the selection of the best hybrid strategy for an EA. Several Q-Learning based approaches are tested and compared over a benchmark of continuous functions. In Chapter 9, the main conclusions of this study are developed, whereas Chapter 10 envisages the future work derived from this PhD thesis. Finally, Appendix A describes the experimental and validation procedures used in the experiments carried out for this work, whereas Appendix B contains detailed tables of the results obtained in the experimentation described in Chapters 5 and 6 so that it is available for a possible further analysis.



Part II

STATE OF THE ART

Chapter 2

Evolutionary Computation

Evolutionary Computation is a subset of the field of Artificial Intelligence. It covers a set of optimization techniques that use evolutionary models and iterative progress. In general, EC includes those mechanisms that make one or more candidate solutions to a problem evolve, by any means, to reach a new solution as good as possible. The use of Darwinian principles for problem solving was introduced simultaneously by three different researchers in the sixties with different approaches: • Evolutionary Programming, by Lawrence J. Fogel; • Genetic Algorithms, by John H. Holland; • Evolution Strategies, by Peter Bienert, Ingo Rechenberg and Hans-Paul Schwefel. These three areas evolved separately for fifteen years. In the early nineties, the scientific community agreed to consider them as three representatives of the same technology: Evolutionary Computation. It is also during these years that a fourth stream, Genetic Programming, was made popular by John Koza. The main objective of this new paradigm of EC was to provide computer systems with the tools to evolve programs capable of solving different problems (programs evolving programs). Today, the field of EC has reached a level of development that none of the aforementioned researchers could imagine when they first proposed their techniques forty years ago. Section 2.1 will review the most representative approaches among the existing evolutionary techniques, whereas Sections 2.2.2, 2.3 and 2.5.1 will analyze in more detail three of these techniques: Genetic Algorithms, Estimation of Distribution Algorithms and Differential Evolution, as they are good representatives of different models in EC. Antonio LaTorre de la Fuente


12

CHAPTER 2. EVOLUTIONARY COMPUTATION

2.1

An Overview of Evolutionary Techniques

Many different optimization techniques have been proposed within the field of EC. However, a standard classification of evolutionary techniques has not yet been proposed. Different authors offer alternative classifications for some of the algorithms, considering different criteria. For this work, our own taxonomy has been proposed trying to group those algorithms which share more common characteristics. However, other alternative taxonomies would be perfectly valid: • Evolutionary Algorithms – Evolutionary Programming – Genetic Algorithms – Evolution Strategies – Genetic Programming – Learning Classifier Systems • Swarm Intelligence – Ant Colony Optimization – Particle Swarm Optimization • Estimation of Distribution Algorithms • Other techniques – Differential Evolution – Cultural Algorithms – Artificial Immune Systems This taxonomy tries to separate those techniques that present a central controller of the overall process from those that do not have one and also those approaches where a new candidate solution is directly constructed from a subset of other candidate solutions from those where the sampling of new individuals is carried out from a probabilistic model created from a set of previous solutions. However, other different taxonomies would be valid as well. It is important to note that, apart from these techniques, many others have been proposed that are variations or combinations of these ones. This makes really difficult to establish a clear difference among groups of techniques and, as a result of that, it is not possible to define a valid classification for all of them. For example, alternative taxonomies have been proposed [Fos01] considering different elements to classify algorithms into different categories. The next sections will briefly review each of the groups presented in the previous taxonomy. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


2.2. EVOLUTIONARY ALGORITHMS

13

Since there exists a great number of different evolutionary approaches, this chapter is not intended to provide a detailed explanation of all of them. For this purpose, the reader is encouraged to check some external references, such as [Bäc95, KES01, LL01].

2.2

Evolutionary Algorithms

Evolutionary Algorithms are a subset of Evolutionary Computation. EAs are generic population-based metaheuristics for optimization that use some mechanisms inspired by Natural Evolution. In general, EAs make a population of candidate solutions to a problem evolve by means of some recombination operators. The suitability of these candidate solutions is measured by a fitness function that evaluates how good an individual for that particular problem is. The fittest individuals have more chances for being selected for the next recombination phase. Evolutionary Algorithms often provide good approximate solutions to complex problems of different fields. As they do not make any assumption about the underlying fitness landscape, EAs have been successfully applied to many disciplines such as engineering, physics, biology, genetics, etc. Different approaches have been proposed in the last decades. This section reviews the four more relevant methods present in the literature, paying special attention to Genetic Algorithms, which are devoted a deeper review in Section 2.2.2.

2.2.1

Evolutionary Programming

Evolutionary Programming (EP) was first used by Lawrence J. Fogel in the sixties as a part of an experiment trying to generate Artificial Intelligence using simulated evolution mechanisms. In this preliminary research [Fog62, Fog64], Fogel used Finite State Machines (FSMs) as individuals for his population of problem solvers. To evaluate these individuals, each FSM was exposed to the environment, i.e., the input observed until that moment, and evaluated according to their ability for the prediction of items. Best accurate individuals were preserved for next generation and modified by means of a mutation operator. This preliminary experiments were extended to other areas such as classification and prediction of time series [Wal67], modelling of systems [Kau67] or gaming [Bur69]. In the seventies, main research efforts were done in pattern recognition systems [Roo70, Cor72]. It was in the decade of the eighties when the use of EP was extended to use arbitrary representations and applied to generalized optimization problems [FF89, Fog91]. New selection mechanisms were proposed [Fog88], as well as techniques for self-adaptation of parameters [FFA91, FFAF92]. Nowadays, the differences among modern Evolutionary Programming techniques, Genetic Algorithms and Genetic Programming is quite vague and most researchers in this field have adopted the more general term Evolutionary Computation to define their work. Antonio LaTorre de la Fuente


14


2.2.2

Genetic Algorithms

Genetic Algorithms are the most usual Evolutionary Algorithms. Although the first work on this kind of algorithms is dated from the late fifties and early sixties [Bar54, Bar57, Fra57, FB70, Cro73], GAs were popularized by the work conducted by John H. Holland and his students from the University of Michigan and, in particular, by his book Adaptation in Natural and Artificial Systems [Hol75]. Since then, GAs have experienced a deep development and have been applied to solve complex problems in many different domains. GAs are closely related and inspired by Natural Evolution. In the real world, each individual of a given species in a population tries to transmit its genetic material to its offspring. In most cases, only the most suitable and adapted individuals are able to survive in their environment and breed new individuals. In the reproduction phase, the genetic material from both ancestors is combined in some way and transferred to one or more descendants. Additionally, the genetic information of the offspring is subject to small mutations result of environmental factors that, sometimes, make these individuals more suitable for surviving. This way, they get a new chance for reproducing and transmitting their genetic material. A Genetic Algorithm implements a simplified version of this metaphor. Its objective is to find the most suitable solution to a given problem, combining candidate solutions to generate new ones and making them compete for a number of generations. The main aspects to be considered are the following: • A representation for candidate solutions to the problem. Each individual in the population represents a candidate solution. This representation is also known as the genome or the chromosome of the individual. Many different encodings have been proposed for different problems such as, for example, bit or real strings and more complex ones, like trees or lists. • A metric for the suitability of the individuals. This value is known as the fitness of the individual and is problem-specific. Mathematically, the fitness function is defined as: f itness : D → R, D being the domain in which the genome representation is defined. For example, in the classic Traveling Salesman Problem [Rob49], the tour length is normally considered as the fitness value to measure the suitability of a candidate solution. • A crossover operator that, given two individuals, is able to combine the genetic information of both ancestors to generate one or more children. Usually, two parents are combined to generate two children, what is known as sexual crossover and mathematically defined as: Crossover : D × D → D × D. • A Mutation procedure, in which the genetic information of an individual is modified in some way. Mathematically: M utation : D → D. • A selection scheme that, given the fitness of the individuals, decides which individuals in the population will take part of the reproduction process. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



15

Once all these elements have been introduced, the general behavior of a Genetic Algorithm can be described as depicted in Algorithm 1. Algorithm 1 Classic Genetic Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

Create initial population of candidate solutions P0 Evaluate initial population P0 while termination criterion not reached do // Pi converged or maximum number of iterations reached while select individuals from current population Pi do Cross selected individuals with certain probability to generate new offspring Mutate descendants with some probability Evaluate new individuals Add new individuals to the auxiliary population Pi0 end while Combine populations Pi and Pi0 according to a pre-established criterion to generate Pi+1 Evaluate population Pi+1 end while

This simple description of a Genetic Algorithm allows several configurations, depending on the selection scheme, the recombination operators (crossover and mutation) or the elitism mechanisms that are actually used. Next subsections will describe some of the most classic alternatives for each of these operators.

2.2.2.1 Selection Schemes The selection method used by the GA is an important issue as it will determine the individuals that will be combined to create the new offspring. Taking up again the analogy with the Natural Evolution, the selection scheme is equivalent to the probability that individuals have to reproduce and transmit their genetic information. Different mechanisms to select individuals have been proposed in the literature. In the next sections some of the most relevant approaches will be reviewed. Roulette Wheel Selection

The Roulette Wheel Selection method was proposed by K.A. DeJong and it is

probably the most frequently used selection method [BT95]. It rewards those individuals with best fitness values increasing their probability for being selected for the reproduction phase. Equation 2.1 mathematically defines this method. Given an individual i, if fi is the fitness of this individual and N is the population size, the probability for this individual to be selected is the quotient between its fitness and the sum of the fitness of all the individuals in the population. fi pi = PN

j=1

fj

(2.1)

Table 2.1 shows an example of how these probabilities are computed for a population of five individuals. Comparing this selection technique with the roulette wheel in a casino, it could be seen as a population of individuals spread through the wheel where the individuals with better fitness are assigned more positions than the others in a proportional way to their fitness. Antonio LaTorre de la Fuente


16


Individual

Fitness

Probability

1 2 3 4 5

6.82 1.11 8.48 2.57 3.08

0.31 0.05 0.38 0.12 0.14

Total

22.05

1

Table 2.1: Example of the Roulette Wheel Selector for a hypothetical population of five individuals

Figure 2.1: Probabilities for individuals according to the Roulette Wheel Selector Figure 2.1 graphically presents the same probability values computed in Table 2.1. Rank Selection

The previous selection mechanism can lead to premature convergence if the fitness land-

scape is not very homogeneous. Great differences among fitness values of different individuals can make the algorithm to always select the same individuals. The Rank Selection tries to overcome this problem by sorting individuals according to their fitness. Each individual is then assigned a rank or label within the [1, N ] interval, where N is the population size. Then, the probability of selection is proportionally distributed according to this rank. Equation 2.2 establishes how this distribution is carried out.

pi =

rank(fi )

(2.2)

N (N +1) 2

Table 2.2 presents an example of how probabilities of selection would be computed by both the Roulette Wheel and the Rank selectors in a population of four individuals where one of them has a fitness value much higher than the remaining individuals. The same information is graphically depicted in Figure 2.2 Tournament Selection The main idea underlying this method is the selection of individuals based on direct comparisons among them. There are two versions of the Tournament Selection: Deterministic and Probabilistic Tournament Selection. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



17

Individual

Fitness

Probability

Rank

Probability with ranks

1 2 3 4

1 3 7 100

0.01 0.03 0.06 0.9

1 2 3 4

0.1 0.2 0.3 0.4

Total

111

1

10

1

Table 2.2: Example of the Rank Selector for a population of four individuals

20% 6% 3% 1%

1

30%

10%

1

2

2

3

3

4

4

90%

40%

(a) Selection probability with roulette wheel selector

(b) Selection probability with rank selector

Figure 2.2: Roulette Wheel vs. Rank Selector

In the first approach, the Deterministic Tournament Selection, n individuals are randomly selected from the population. The fitness of the individuals is compared and the one with the best fitness value is selected. The highest the value of n, the more selective pressure the mechanism introduces, making it more difficult for individuals with bad or even average fitness being selected for reproduction. In the second approach, the Probabilistic Tournament Selection, the difference relies on how individuals are actually selected. Instead of always selecting the individual with the best fitness value, each of the individuals in the tournament is assigned a probability value to be selected. Usually, the probability for selecting the best individual in the tournament is within the interval (0.5, 1].

2.2.2.2 Crossover Operators This section will review some of the most common crossover operators. Some of them were proposed when Genetic Algorithms were only intended to solve binary problems, although they can be applied to strings of any type of data.

One Point Crossover The One Point Crossover (OPC) is applied to two parent individuals. A random position common to both vectors is selected. The information of both parents is exchanged from this random point, Antonio LaTorre de la Fuente


18


as it can be seen in Figure 2.3.

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Figure 2.3: One Point Crossover

Two Points Crossover The Two Points Crossover (TPC) is very similar to the previous operator. It is a result of the research carried out by K.A. DeJong presented in [BT95] and consists, as in the previous case, in randomly selecting two points of the genome vector. Now, the information within those positions is exchanged between both parent individuals. Figure 2.4 shows an example of how this crossover operator works.

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Figure 2.4: Two Points Crossover

Uniform Crossover

The Uniform Crossover (UC) was defined in 1991 by Syswerda [SP91]. In this case,

for each position of the genome vector, i.e., for each gene, the inherited information is randomly selected from each of the ancestors. Figure 2.5 depicts how this operator works.

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Figure 2.5: Uniform Crossover

Arithmetic Crossover The Arithmetic Crossover (AC), as opposed to the previous operator, generates only one child. To create the offspring genome, it applies a binary operator as, for example, the AN D operator, to the genes of both parents. Figure 2.6 shows an example of this operator. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



0

19

0

0 0

0

0

0

0

0

0

0

0

Figure 2.6: Arithmetic Crossover

Cycle Crossover

The Cycle Crossover (CX) was first defined by Oliver in 1987 [OSH87] and specially

proposed for permutation-based problems. For this reason, it always produces valid solutions for this kind of problems (the Traveling Salesman Problem, for example). This crossover operator works as follows: first, a position is randomly selected from one of the parents and copied to the offspring. Then, the value of that gene in the other parent is checked. If this value has not been yet copied to the offspring, then it is copied in the same position it appears in the first parent. This procedure is followed until a repeated gene appears, which means that a cycle has been found. Figure 2.7 shows an example of this crossover operator in which, first, a cycle is identified in the first parent and, then, a second cycle is identified in the other one. First cycle

1

2

2 4

3 6

4 5 6 8

7

5

7 3

8 1

>

1

8

4

2

Second cycle

1

2

6

4

7

5

3 8

Figure 2.7: Cycle Crossover

Ordered Crossover

The Ordered Crossover (OX) was proposed by Davis in 1985 [Dav85a] and, as the

previous one, is intended for permutation-based problems. In this operator, two random positions are selected and the information within these positions is copied to the each of the descendants (one from each parent). Then, each descendant fills the remaining genes starting from the right of the second randomly chosen position selecting the values not present in that fragment in the same order they appear in the other ancestor. Figure 2.8 exemplifies the behavior of this operator.

Figure 2.8: Ordered Crossover

BLX−α Crossover This operator was proposed by Eshelman and Schaffer in [ES93]. It works on the continuous domain and has been used in much of the work carried out on Real Coded Genetic Algorithms. Antonio LaTorre de la Fuente


20


Given two parents, x = {x1 , x2 , . . . , xn } and y = {y1 , y2 , . . . , yn }, the operator generates an offspring z = {z1 , z2 , . . . , zn }, in which each zi is uniformly chosen within the interval [mini − I · α, maxi + I · α]. In this context, maxi = max{xi , yi }, mini = min{xi , yi } and Ii = maxi − mini . Figure 2.9 depicts how each zi is computed.

I·α ai

I xi

I·α yi

bi

Figure 2.9: BLX−α Crossover

2.2.2.3 Mutation Operators Mutation operators act as natural mutations do in Natural Evolution. Their mission consists in slightly perturbing the genome of an individual. These perturbations, in some cases, can lead to a better adapted individual. Next subsections will briefly review some of the most commonly used mutation operators. Simple Inversion Mutation

The Simple Inversion Mutation (SIM) was proposed by John H. Holland in 1975

[Hol75]. It selects two random positions in the genome vector and mirrors the genes within those positions. Figure 2.10 shows an example of how this operator works.

0

0

0

0

0

0

Figure 2.10: Simple Inversion Mutation

Repeated Exchange Mutation

The Repeated Exchange Mutation (REM) was first used by Banzhaf in 1990

[Ban90]. This operator carries out a random number n of exchanges among pairs of genes that are also randomly selected. These exchanges never invalidate the genome in the case of permutation problems. An example in which only one exchange takes place is presented in Figure 2.11.

Figure 2.11: Repeated Exchange Mutation

Uniform Mutation The Uniform Mutation (UM) is commonly used with real string genomes. This operator randomly selects certain positions in the real vector and exchanges their value for other uniformly selected within the valid domain of the problem. Figure 2.12 gives an example of how this operator works. These are only some examples of mutation, crossover and selection operators that have been commonly used in the field of GAs. The literature is quite rich on this topic, and many other operators in both discrete and A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



21

Figure 2.12: Uniform Mutation continuous domains exist. Furthermore, some problems, such as those described in Chapter 5, have their own specific set of operators due to the constraints introduced to the encoding of the individuals. A more detailed review of some of the operators available for this specific type of problems will be given in Section 5.3.1.2.

2.2.3

Evolution Strategies

Evolution Strategies (ESs) were proposed by P. Bienert, Ingo Rechenberg and Hans P. Schwefel in the mid sixties [Rec71, Sch74]. These techniques are general optimizers that can be applied to problems of different domains [BS02]. For this purpose, a quality function F must be provided. This quality function operates on a set of decision variables y := (y1 , . . . , yn ):

F (y) → y ∈ Y Y must a set data of finite but not necessarily fixed length as, for example, the n-dimensional real, integer or combinatorial search space. Evolution Strategies work with populations of individuals. Each individual comprises not only the set of decision variables yk but also a set of endogenous strategy parameters sk . Therefore, an individual ak in a population Pi is defined as:

ak := (yk , sk , F (yk )) The set of endogenous parameters are typical of Evolution Strategies and may be adapted through the evolutionary process. There are other parameters that remain constant through the search process that are known as exogenous parameters. These parameters give rise to two canonical versions of ESs: (µ/ρ, λ) − ES and (µ/ρ + λ) − ES. In this nomenclature, µ denotes the number of parents in the current population, ρ ≤ µ the mixing number, i.e., the number of parents involved in the creation of a descendant and λ the number of new individuals to be generated. The main difference between both types of ESs is how individuals are selected for the next generation. In the first case, the ’,’ recombination, offspring individuals completely replace those in the current population. Thus, µ < λ must hold, so that the convergence to an optimal solution is guaranteed. In the second case, the ’+’ recombination, individuals in current and offspring populations are combined and the best individuals from both of them are selected as parents for the next generation. Today, Evolution Strategies are one of the most powerful techniques in real optimization, especially one of its variants, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which has obtained remarkable success on complex optimization problems in the recent years [AH05]. Antonio LaTorre de la Fuente


22


2.2.4

Genetic Programming

Genetic Programming (GP) is a type of Evolutionary Algorithm quite similar to Genetic Algorithms. The main difference between both approaches is the representation used for the solutions to the problem. While GAs normally use basic data structures to encode individuals (integer or real numbers, for example), GP uses a more complex representation, trees in most cases, to encode candidate solutions as programs of variable length. First results in GP were reported by Stephen F. Smith in the eighties [Smi80]. In 1981, Richard Forsyth made small programs evolve and applied them to forensic science in the UK police [For81]. First modern results in GP, i.e., programs organized in tree structures, were independently reported by Nichael L. Cramer in 1985 [Cra85] and Jürgen Schmidhuber in 1987 [Sch87]. However, it was John R. Koza who really extended and popularized Genetic Programming in the nineties, applying these techniques to several complex optimization and search problems [Koz92]. GP is very computationally expensive. For this reason, in the last decade it was only applied to small or middle size problems. It has been in the last years, thanks to the improvements in GP techniques and to the exponential growth in CPU power, that Genetic Programming has been applied to a wider range of complex problems, from quantum computing to electronic design, among others [SBB98, KKS+ 03]. As stated before, the main difference of GP compared to GAs is the tree representation used for individuals. Equivalent operators to those used in GAs are used to recombine individuals, but adapted to work on this representation. The crossover operator, for example, exchange branches between parents to produce new individuals, whereas the mutation operator can collapse, duplicate, invert or swap branches. Just as a matter of interest, there is an annual competition where researchers compete for a $10, 000 prize if they report human-competitive results obtained by a program created by means of GP techniques.

2.3

Estimation of Distribution Algorithms

Estimation of Distribution Algorithms (EDAs) are an outgrowth of Genetic Algorithms. Instead of using recombination operators to produce new offspring, a probabilistic model is learned from explored solutions and new solutions are sampled from this model. Algorithm 2 Estimation of Distribution Algorithm Create initial population P0 Evaluate initial population P0 while termination criterion not reached do // Pi converged or maximum number of iterations reached Select a subset of the current population Pî ⊂ Pi Estimate the probability distribution of the subset Pî : pi+1 (x), where x ∈ Pî 6: Sample the probability distribution pi+1 (x) to generate Pi+1 7: end while

1: 2: 3: 4: 5:

The general scheme of an EDA can be observed in Algorithm 2. In the step 5 of the algorithm, it is necessary to estimate the probability distribution pi+1 (x), where x is an individual of the population. In A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


2.3. ESTIMATION OF DISTRIBUTION ALGORITHMS

23

general, the genome of an individual contains values for a set of variables. Therefore, x = (x1 , x2 , x3 . . . ) and pi+1 (x) = pi+1 (x1 , x2 , x3 . . . ). The complexity of computing the joint probability distribution (x1 , x2 , x3 . . . ) increases, in the worst case of dependency among all the variables, exponentially depending on the number of variables of x. To avoid such an expensive computational cost, Estimation of Distribution Algorithms use a Probabilistic Graphical Model (PGM). The use of a PGM reduces the computing time of the joint probability distribution in exchange for estimating that distribution by means of a conditional causal model among the variables, based on a dependency/causality graph. As a result of this assumption, a simplified distribution is actually computed as an approximation to the real joint distribution. Graphically, a PGM is an acyclic directed graph. Each node of the graph represents a variable and each arc a conditional dependency between variables. Figure 2.13 shows an example of a PGM.

Figure 2.13: Example of a graphical model for x = (A, B, C, D) In principle, to compute the joint probability distribution of x = (A, B, C, D), Equation 2.3 should be used.

p(A, B, C, D) = p(A|B, C, D) ∗ p(B|C, D) ∗ p(C|D) ∗ p(D)

(2.3)

The computation of this equation would involve calculations with fifteen parameters. However, the PGM in Figure 2.13 shows that conditional independence among certain variables can be considered. Consequently, the computation of the joint probability distribution could be simplified, as shown in Equation 2.4, with just eight parameters.

p(A, B, C, D) = p(A|C, D) ∗ p(B|D) ∗ p(C) ∗ p(D)

(2.4)

Using a PGM is not always possible, as it deeply depends on the domain of the problem and, so, on the representation used to encode the variables in the individuals. If those variables are discrete, then Bayesian Networks should be used [Pea88]. On the other hand, if the variables are continuous, Gaussian Networks should be used instead [SK89]. Some approaches allow mixed variables, such as the Mixed-Integer Bayesian Optimization Algorithm (MIBOA) [ELZ+ 08]. Antonio LaTorre de la Fuente


24


2.3.1

Learning Heuristics

An important aspect of Estimation of Distribution Algorithms is how the structure of the PGM, i.e., the dependency among variables, is generated. Without any specific knowledge of the problem, the only way for defining these dependencies is by means of statistical analysis. This process is known as structure learning and several different methods exist for this purpose. Some of the most usual heuristics for the structure learning phase will be briefly reviewed in the next sections.

2.3.1.1 No Interdependencies Model This is the simplest model, in which independence among variables is assumed. From the point of view of the graphical model, it means that the graph will have no arcs. Therefore, the joint probability distribution is defined as the product of the marginal probability of each variable, as can be seen in Equation 2.5.

p(x) =

n Y

p(xi )

(2.5)

i=1

The main advantage of this model is its low computational cost, although the assumption of independence among all the variables could lead to a very simplistic approach for some problems. Some examples of this kind of algorithms follow. In the Bit-Based Simulated Crossover (BSC) Algorithm introduced in [Sys93], each possible value of every variable is assigned a probability proportional to the fitness value of those individuals taking that value in the current generation. In [Bal94] Population-Based Incremental Learning (PBIL) is proposed. In this algorithm, a vector of probabilities is maintained for each variable. The probability for each of the possible values of each variable is updated by means of the Hebbian rule used in artificial neural networks. A slightly different approach is presented in [HLG99], where a Genetic Algorithm for binary optimization is proposed. This algorithm maintains a probability vector that initially takes a value of 0.5 for each variable. Then, two individuals are generated and a competition is carried out between them at variable level. For each variable, if the value for that variable in the winning individual is different to that in the loser, the probability vector is updated by a constant value (it is increased if the winning individual had a one value for that variable and decreased otherwise). Finally, the most extended heuristic within this model is the Univariate Marginal Distribution Algorithm (UMDA) [Müh97], that has also its continuous version U M DAc [LELP00b]. In this approach, each p(xi ) is estimated from the marginal frequencies of the i − th variable of the selected individuals (Equation 2.5).

2.3.1.2 Bivariate Dependencies Model A slightly more sophisticated approach than the previous one is to consider the dependencies existing between two variables (pairwise or bivariate dependencies). This implies a good trade-off in terms of complexity and efficiency as, at most, one variable may depend on another one. In the graphical model, this means that there could only exist, at most, one arc starting from each node. To construct such graphical models, greedy A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


2.4. SWARM INTELLIGENCE

25

approaches which add arcs to an initially disconnected graph are normally used. Some of the most frequently used heuristics within this model are Mutual Information Maximization for Input Clustering (MIMIC) [BIV97], Combining Optimizers with Mutual Information Trees (COMIT) [BD97] and the Bivariate Marginal Distribution Algorithm (BMDA) [PM99]. All these approaches are available in both discrete and continuous versions.

2.3.1.3 Multivariate Dependencies Model This type of heuristics tends to generate more realistic models, as they are more flexible allowing more dependencies among variables. Its main disadvantage is that the computational cost of learning and sampling such a model can be considerably high. Some examples of heuristics in the discrete case are the Estimation of Bayesian Networks Algorithm (EBNA) [LELP00a] and the Bayesian Optimization Algorithm (BOA) [PGCP99]. In the continuous case, most representative heuristics are the Estimation of Multivariate Normal Algorithm (EMNA) [LL01] and the Estimation of Gaussian Networks Algorithm (EGNA) [LL01].

2.4

Swarm Intelligence

Swarm Intelligence is a type of Artificial Intelligence that simulates intelligent systems composed of simple individuals that coordinate themselves in a decentralized way. These systems lack of a leader figure that guides the behavior of the group. Instead, the global intelligence results from the local interactions of the individuals with each other and the environment. Several optimization algorithms have been inspired by this type of intelligence. In particular, two classes of algorithms have acquired much relevance in the recent years: Ant Colony Optimization (ACO), which is inspired by the social behavior of colonies of ants and termites, and Particle Swarm Optimization (PSO), which is inspired by the behavior of flocks of birds and schools of fish. This section will briefly review both approaches and their main characteristics.

2.4.1

Ant Colony Optimization

In the natural world, ants (initially) walk randomly in the search for food. When an ant finds a food source, it gets back to its colony, leaving a pheromone trail behind it. Other ants searching for food which find this trail tend to follow the same path and thus reinforce the pheromone trail. However, pheromone evaporates with time. The more time it takes to get to the food source, the more pheromone is evaporated. Consequently, a short path maintains a higher level of pheromone, which makes it more likely to be selected by new ants walking out for food. Figure 2.14 presents an example of the behavior of an ant colony. First, an ant finds a food source following an arbitrary path a. When it comes back to the colony following a probably different path b, it leaves a Antonio LaTorre de la Fuente


26


pheromone trail so that other ants can find out how to get to the food source. Second, many other ants go to the source and they follow different paths. Every ant that goes to the food source and comes back leaves a pheromone trail in the path it follows that reinforces the previous pheromone present on the path. The shortest path will have higher levels of pheromone as it evaporates quicker in longer paths. With time, most ants will follow the shortest path in their way to the food source (step 3). F

F

N

N

N

1

2

3

F

b

a

Figure 2.14: Example for the social behavior of ants [Wik09]

Following this principle, Marco Dorigo proposed a new optimization technique, Ant Colony Optimization, in his PhD Thesis [Dor92]. It was initially intended for combinatorial optimization, as a direct application of the natural metaphor, but recent extensions have been able to adapt the algorithm to work also on continuous domains. In its simplest form, Ant Colony Optimization is based on two main equations: the selection of the next node an ant will move to and the update of the pheromone level of each node. In the case of combinatorial problems, a set of nodes is defined and the artificial ants travel through them. Each ant has a probability pi,j to move from node i to node j. Additionally, each ant will leave a trail of pheromone on each node it visits. Finally, the evaporation of pheromone is also considered by the model. Equation 2.6 defines how the probabilities of node transitions for each ant are computed. β α (τi,j ) · (ηi,j ) pi,j = P β α (τi,j ) · (ηi,j )

(2.6)

where τi,j is the amount of pheromone on edge i, j A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


2.4. SWARM INTELLIGENCE

27

α is a parameter to control the influence of τi,j ηi,j is the desirability of edge i, j β is a parameter to control the influence of ηi,j It basically depends on the trail of pheromone present on the edge and the a priori knowledge of the desirability of that edge. Equation 2.7 defines the pheromone updating process.

τi,j = (1 − ρ) · τi,j + ∆τi,j

(2.7)

where τi,j is the amount of pheromone on edge i, j ρ is the ratio of pheromone evaporation and ∆τi,j is the amount of pheromone deposited, typically computed as ( ∆τi,j =

1 Lk

0

if ant k travels on node i, j otherwise

For each edge, the pheromone updating process computes how much pheromone the ant traveling through that edge will leave as well as how much pheromone is evaporated. Normally, the amount of pheromone left on an edge depends on the cost of the travel of the k − th ant (typically the length of the tour, Lk ). As it has been said before, Ant Colony Optimization was initially intended for combinatorial optimization. It has been extensively used for classic problems such as the Traveling Salesman Problem [DG97, BGD02] and for routing in telecommunications networks [SHBR97, DD98]. More recently, some researchers have been interested in the adaptation of ACO for continuous domains [BP95]. Finally, in 2000 Gutjahr provided the first evidences of convergence for Ant Colony Optimization [Gut00].

2.4.2

Particle Swarm Optimization

Particle Swarm Optimization (PSO) is a stochastic, population-based optimization technique based on the principles of social behavior observed in flocks of birds and schools of fish that was first proposed by Kennedy and Eberhart in [KES01]. In PSO, a set of agents or particles cooperates in the search of a solution for an optimization problem. Each particle encodes a solution to the problem and uses two types of information to select where to move in the search space: its own experience (good solutions explored before by that particle) and the experience of neighbor particles (good solutions explored by neighbor particles). At each iteration, every particle moves in the search space with a velocity that is the averaged sum of three components: 1. The previous velocity. 2. A velocity that moves the particle towards the best solution it previously found. 3. A velocity that drives the particle to the best solution found by neighbor particles. Antonio LaTorre de la Fuente


28


Equation 2.8 defines the velocity updating mechanism. ~ t (~bt − ~xt ) + φ2 U ~ t (~lt − ~xt ) ~vit+1 = ω~vit + φ1 U 1 i i 2 i i

(2.8)

where ω is a parameter called inertia weight φ1 and φ2 are two parameters called acceleration coefficients, that drive the solution towards the local or the global best solution so far, respectively ~ t and U ~ t are two n × n diagonal matrices in which the diagonal elements are random numbers U 1

2

within the interval [0, 1] ~bt is the best solution found so far by the particle i

~lt is the best solution found so far by neighbor particles i These three terms in the previous equation represent three different aspects of the behavior of the particle: 1. The inertia or momentum, a memory of the previous flight direction that avoids drastic direction changes. 2. The Cognitive component, that models the tendency of particles to return to good previously known positions. 3. The Social component, that compares how good a particle is relative to their neighbors. Once the new velocity of the particle has been computed, the new position is calculated as stated in Equation 2.9. ~xt+1 = ~xti + ~vit+1 i

(2.9)

where ~xt+1 is the new position of the particle i ~xti is the current position of the particle ~vit+1 is the new velocity of the particle Particle Swarm Optimization can be applied to a wide range of problems, including both discrete and continuous problems. Some of the more relevant topics to which PSO has been applied cover antenna design [JRS07], biomedical and pharmaceutical applications [VOV05, SXK+ 06] or control applications [Gai04], to name a few. [Pol08] presents a more detailed review of the applications of PSO.

2.5

Other Techniques

2.5.1

Differential Evolution

The Differential Evolution (DE) Algorithm was proposed by Rainer Storn and Kenneth Price in 1995 [SP95]. It was first intended to solve the Chebyshev polynomial fitting problem. For this purpose, a new A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


2.5. OTHER TECHNIQUES

29

method based on the Genetic Annealing technique was proposed, obtaining the third place in the “First International Contest on Evolutionary Optimization” [BDL+ 96]. The canonical version of this algorithm defines four phases: initialization, mutation, recombination and selection. Algorithm 3 describes the way these algorithms work. Algorithm 3 Differential Evolution Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

Create initial population of candidate solutions P0 Evaluate initial population P0 while termination criterion not reached do // Pi converged or maximum number of iterations reached while select individuals from current population Pi do Mutate individuals according to a pre-established factor obtaining descendants Recombine descendants according to a pre-established probability Evaluate new individuals if fitness of descendant is better than fitness of ancestor then Replace ancestor by the descendant in Pi+1 else Keep the ancestor in Pi+1 end if end while end while

2.5.1.1 Initialization In most of the cases, the individuals are uniformly initialized in Differential Evolution as it happens in other evolutionary approaches, such as Genetic Algorithms or Particle Swarm Optimization. Depending on the problem, a specific initialization mechanism, probably based on a heuristic meaningful for the particular problem, could be used to enhance the quality of the initial population.

2.5.1.2 Mutation Conceptually, the objective of mutation in a DE, unlike in GAs, is to breed new offspring. Given a population in the generation i, each individual in this population xi,j is selected for recombination. The selected vector receives the name of objective vector. Three other vectors, xr1 , xr2 and xr3 , are then randomly selected. They are all different to the objective vector and different among them. These four vectors are then combined to obtain a new vector candidate to replace the objective vector (Equation 2.10).

vi+1,j = xr1 + F (xr2 − xr3 )

(2.10)

First, vectors xr2 and xr3 are subtracted (Figure 2.17b) and scaled according to an F factor (Figure 2.17c). Finally, the result vector from the previous step and vector xr1 are added (Figure 2.17d). The final result vector of the mutation phase (Figure 2.18a) is known as the donor vector. Antonio LaTorre de la Fuente


30


2.5.1.3 Recombination The crossover phase is carried out after the mutation, when the donor vector has been obtained. This operator combines the objective and the donor vectors (Figure 2.18b). There exist mainly two different crossover operators: the Binomial and the Exponential crossovers. Both operators will be introduced through this section. Binomial Crossover

Binomial Crossover selects the value for each gene of the final vector, according to the

crossover probability, from one ancestor or the other, as shown in Equation 2.11 and in Figure 2.15. ui+1,j,k =

vi+1,j,k xi,j,k

if U (0, 1) ≤ CR or j = U (1, D) otherwise

(2.11)

k is the gene in the genome vector, j identifies the j − th individual in the population, D is the vector size, CR the probability that elements in the donor vector are exchanged for those in the objective vector and U is a uniform random number generation function. Objective V.

0.7 0.8 0.6 0.1 0.9 0.7 0.8 0.6 0.1 0.3

Donor V.

0.4 0.8 0.6 0.2 0.3

Figure 2.15: Binomial Crossover

Exponential Crossover

The Exponential Crossover works in a different way than the Binomial Crossover

does. An integer random number n is selected within the [0, D − 1] interval, where D represents the size of the vector. This number will determine the initial position from the donor vector for selecting genes. On the other hand, another integer random number L will be selected, within the interval [1, D], according to the probability factor CR. This value will determine how many genes in the donor vector will replace those in the objective vector. Equation 2.12 mathematically presents this crossover operator, whereas Figure 2.16 provides an example of how genes are exchanged by the Exponential Crossover.

ui+1,j,k =

vi+1,j,k xi,j,k

for k = (n) mod D, (n + 1) mod D, ..., (n − L + 1) mod D otherwise

(2.12)

2.5.1.4 Selection This last step acts as an elitism procedure. Basically, what this operator does is a comparison between the objective vector and the new individual created, selecting the one with the best fitness value. Equation 2.13 mathematically defines this operator. xi+1,j =

ui+1,j xi,j

if f (ui+1,j ) ≥ f (xi,j ) otherwise


(2.13)


2.6. CHAPTER SUMMARY

31

0

1

2

3

4

Objective V. 0.7 0.8 0.6 0.1 0.9

0.6 0.8 0.6 0.8 0.3 Donor V.

0.6 0.2 0.4 0.8 0.3

n

D=5 n=3 L=4

Figure 2.16: Exponential Crossover In previous equation, j determines the position of the individual in the population and f represents the fitness function.

2.6

Chapter Summary

In this chapter, an overview of some of the best-known evolutionary techniques has been provided. However, this survey has not been exhaustive: not all the available approaches have been reviewed and not all the algorithms have been described to the same level of detail. This was not the objective of this chapter, as said in Section 2.1. Instead, a general view of the available alternative approaches has been given and the issue of the selection of the most appropriate algorithm among all the available ones has naturally arisen. As a compromise solution to the use of several different evolutionary approaches, and to enhance the performance of individual algorithms, many hybrid approaches have been proposed, with more or less success. In the next chapter, some of the most relevant work in this area will be reviewed and compared.



32


(a) Selection of the Objective Vector (red point)

(b) Selection of xr1 , xr2 and xr3

(c) Subtraction (black vector) and scaling (red vector) of xr2 and xr 3

(d) Addition of the result of the previous step and xr1

Figure 2.17: Differential Evolution Algorithm (I) [Fle04]



2.6. CHAPTER SUMMARY

33

Child Vector

Donor Vector

Objective Vector

(a) Donor vector is the result of the previous steps

(b) Application of the crossover operator to the donor and the objective vectors to obtain the child vector

(c) Selection between the objective and the child vectors

Figure 2.18: Differential Evolution Algorithm (II) [Fle04]



34




Chapter 3

Adaptation and Hybridization in Evolutionary Computation

3.1

Introduction

Evolutionary Algorithms are complex optimization techniques with many parameters that must be adjusted. The correct choice of a representation, recombination operators and their probabilities and other parameters of the algorithm is a hard task and it is often considered as an optimization problem by itself [Gre86]. Furthermore, the combination of different encodings or recombination operators, for example, can increase the performance of the Evolutionary Algorithm. Broadly speaking, two are the main strategies for setting the appropriate parameter values in EAs: parameter tuning and parameter control. The first one can be seen as the commonly practiced approach of testing several combinations of parameters before the final run of the algorithm to identify a good enough set of values. For each test run, the parameters remain constant through all the execution. The second approach, parameter control, starts the execution of the algorithm with initial parameter values that are modified during the run to adapt the behavior of the algorithm to the needs of the problem. A good example of the influence of the selection of appropriate values for the parameters of an algorithm can be found in [Eib09]. In this tutorial, the parameter tuning of a DE algorithm is carried out, demonstrating that the performance of the algorithm can be dramatically increased by these techniques (the values for the parameters determined after the parameter tuning phase made the DE algorithm to achieve better performance than the best previous algorithm, CMA-ES, on a well known benchmark of continuous functions [SHL+ 05]). From the point of view of the hybridization of different EAs, few attempts have been done to define a Antonio LaTorre de la Fuente


36

CHAPTER 3. ADAPTATION AND HYBRIDIZATION IN EVOLUTIONARY COMPUTATION

taxonomy of the possible strategies for this combination. The most remarkable effort to provide a complete classification of hybrid metaheuristics has been proposed by Talbi in [Tal02]. This taxonomy is structured into two levels. The first level establishes a hierarchical classification of the algorithms according to several characteristics. The second level introduces a flat classification, as the descriptors of the algorithms may be chosen in an arbitrary order. This taxonomy has been extended in this work (Figure 3.2) in order to incorporate additional descriptors at the flat level defined by Talbi. Furthermore, a second hierarchical level has been introduced in order to define several exclusive subcategories of some of the descriptors at the flat level. In this chapter, all the parts subject to adaptation and/or hybridization will be identified (Section 3.1.1) and analyzed (Section 3.1.2) according to the aforementioned taxonomies. Finally, a review of previous work on hybridization and adaptation in Evolutionary Algorithms will be presented in Section 3.2, structured according to the new taxonomy defined in Section 3.1.2.

3.1.1

What Can Be Adapted/Hybridized?

Almost every component of an Evolutionary Algorithm can be adapted and/or hybridized. Eiben et al. identified in [EHM99] the following parts of EAs that are subject to adaptation and/or hybridization: 1. Representation of individuals 2. Evaluation function 3. Recombination operators and their probabilities 4. Selection operator 5. Replacement operator 6. Population size, topology, etc. As it can be seen, the number of possible combinations from the previous parameters is quite large, especially if we take into account that each of the previous parts of an EA can be parametrized, and that the number of parameters is not clearly defined. Moreover, different values for some of these parameters can deeply modify the behavior of the Evolutionary Algorithm, emphasizing more one characteristic or another (Table 3.1). For this reason, several strategies have been proposed to help to identify a satisfactory set of values for these parameters and to combine different search approaches to obtain better results than those of each of the approaches on itself. The classification of these strategies will be discussed in the next section. Although this classification was established for Evolutionary Algorithms, it is quite constrained to Genetic Algorithms. However, more heterogeneous scenarios are possible (and frequent) in nowadays Evolutionary Computation. For example, different metaheuristics (including different EAs) can be combined to a greater or lesser extent. The different alternatives to carry out this combination will be analyzed in the next section. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


3.1. INTRODUCTION

37

Table 3.1: Some of the parameters of EAs subject to adaptation and their effect on the behavior of the algorithm Parameter Population Size Probability of Operator Elitism Percentage

3.1.2

Range of Values [1, + inf) [0, 1] [0, 1]

Effect on the algorithm Evolutionary Pressure Exploration/Exploitation Evolutionary Pressure

How Adaptation/Hybridization Can Take Place?

The first attempts to introduce a terminology and to establish a taxonomy of the different mechanisms of adaptation in Evolutionary Algorithms were independently conducted by Angeline [Ang95] and Spears [Spe95]. However, both classifications are quite similar, although the terminology used is not the same. Spears divides the adaptation strategies in two main groups: offline and online approaches. The offline approach carries out several executions of the Evolutionary Algorithm to find the appropriate values for its parameters [Gre86]. On the other hand, the online approach tries to adapt the parameters of the GA as it solves the problem. Following the classification from [Spe95], this approach can be further divided into three categories: tightly coupled (empirical rules category in Angeline’s terminology), loosely coupled and uncoupled (absolute rules category in Angeline’s terminology). The first group of algorithms use the same GA to solve the problem and to adapt its own parameters, i.e., the EA is used to search in both the solution and the parameters search spaces [SM87]. In the loosely coupled group of algorithms, the GA is partially used to adapt its parameters, whereas uncoupled algorithms use a totally separate mechanism to carry out the adaptation [Dav89]. Four years later, Eiben et al. proposed, in [EHM99], a slightly different classification, that was also structured into two levels. In the first level, the parameter setting strategies are divided into two groups: parameter tuning, if the values for the parameters of the EA are set before the algorithm is executed, and parameter control, if the optimal values for the parameters are set during the run. Furthermore, the parameter control strategy is subdivided into three different levels: deterministic, adaptive and self-adaptive. The deterministic approach modifies some parameter according to a deterministic rule, without considering any feedback information from the algorithm. On the other hand, the adaptive approach takes into consideration some feedback information from the search process to decide which parameter should be changed and how this modification should be carried out (direction, magnitude, etc.). Finally, self-adaptive approaches use the concept of "evolution of the evolution", in which the parameters to be adapted are encoded in the chromosome of the individuals and are modified by the same mechanisms used by the global search. Better values for these parameters should lead to better individuals and thus these parameters tend to survive and are propagated through the offspring of the next generations. The taxonomy derived from this classification is depicted in Figure 3.1. The main difference among the three classifications is that Eiben et al. distinguish between deterministic and adaptive mechanisms, whereas the two other taxonomies consider both strategies as only one (uncoupled or absolute category, respectively). According to Eiben et al., this distinction is quite important, as the deterAntonio LaTorre de la Fuente


38


Parameter Setting n

o ef

re

t

he

ru

b

Parameter Tuning

Deterministic

af te rt

he

ru

n

Parameter Control

Adaptive

Self-Adaptive

Figure 3.1: Taxonomy of parameter setting proposed in [EHM99]

ministic strategy does not consider any feedback from the search process. Nowadays, this differentiation seems to be accepted by the community and thus the taxonomy proposed in [EHM99] is the most commonly referred, and the one this work has considered. Regarding the combination of different evolutionary approaches, the first attempt to define a complete taxonomy of hybrid metaheuristics was made by Talbi in [Tal02]. This taxonomy is a combination of a hierarchical and a flat classification structured into two levels. The first level defines a hierarchical classification in order to reduce the total number of classes, whereas the second level proposes a flat classification, in which the classes that define an algorithm may be chosen in an arbitrary order. The classes of the taxonomy depicted in Figure 3.2 not surrounded by a box represent Talbi’s original taxonomy. In the first level, two different classes of algorithms, low-level and high-level, are considered. The low-level hybridization takes place when a given function of one algorithm is replaced by another algorithm. On the other hand, in the high-level hybridization, the algorithms are self-contained and completely decoupled. These two classes can be further divided into two new classes, respectively: relay and teamwork. The relay hybridization appears when the algorithms are applied consecutively, whereas the teamwork hybridization means that the algorithms are executed simultaneously. In the second level, the descriptors can be chosen in an arbitrary order. In Talbi’s work, three are the descriptors considered at this level. First, the hybridization can be homogeneous (all the algorithms are of the same type) or heterogeneous (different algorithms are used altogether). Second, the algorithms that make up the hybrid algorithm can carry out a global search (all the algorithms search within the same search space) or a partial search (the problem is divided into subproblems, and each algorithm searches in different areas of the search space). Finally, the hybridization can be carried out among general purpose or specialized algorithms. For this work, a new taxonomy has been proposed, mainly based on Talbi’s taxonomy, although it takes some concepts from the parameter setting classification proposed by Eiben [EHM99]. The boxed classes in A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


3.1. INTRODUCTION

39

Hybrid Metaheuristics

Relay

High-level

Teamwork

Relay

Partial

General

Shared Population

Specialist Competitive

Collaborative

Private Populations

Encoding

Adaptive

Self-Adaptive

Hierarchical

Operator

Teamwork

Flat

Homogeneous Heterogeneous Global

Algorithm

Hierarchical

Low-level

Figure 3.2: Taxonomy of hybrid algorithms. Boxed classes are additions to Talbi’s taxonomy [Tal02]

Figure 3.2 represent the new classes added to the original taxonomy. At the flat level, two new descriptors have been considered. First, how many new individuals are created by each algorithm in each step or generation. If the number of individuals that an algorithm can generate does not change through the execution of the hybrid algorithm, the approach is said to be collaborative; otherwise, it is said to be competitive. Second, how individuals created by the different algorithms are managed. There are two possibilities at this point: the hybrid algorithm can use either a shared population or private populations for each algorithm. A third level has also been added to the taxonomy. In this case, the classification is again hierarchical, and it is intended to allow the differentiation of some subclasses of descriptors of the upper level that are exclusive among them. This new taxonomy, first, establishes three different heterogeneity classes: algorithm, if completely different algorithms are used; operator, if algorithms of the same type with different recombination operators are used; and encoding, if different encodings for the candidate solutions are used. Despite being at this hierarchical level, these descriptors are not exclusive among them. The second addition at this level is the subdivision of the competitive class into two new classes, following the nomenclature used at Eiben’s work: adaptive, if the number of individuals to be generated by each algorithm each step or generation is modified through the execution of the algorithm according to the feedback received from the algorithm; and self-adaptive, if these proportions are adapted by using the same mechanisms used to search in the solution space (this normally means that the participation ratios of the different algorithms are encoded along with the rest of the solution in the chromosome of the individual). In the next section, this taxonomy will be used to present a review of work relevant in the field of Hybrid Evolutionary Algorithms. Antonio LaTorre de la Fuente


40


3.2

Previous Work on Adaptive and Hybrid Evolutionary Algorithms

In this section, a survey of some of the work relevant to Hybrid EAs will be reviewed. The taxonomy proposed in the previous section will be used to guide this review, giving examples of each of the new classes defined by the taxonomy, focusing on the high-level class of hybrid algorithms, the main objective of MOS, as it will be explained in Section 3.3. As the classes in the flat part of the taxonomy are not exclusive among them (one algorithm can be, for example, competitive and use a shared population) the survey has been organized according to the main distinctive characteristic of the reviewed algorithms, although they can, of course, present some of the other characteristics at the flat level. For examples of the classes in the original taxonomy proposed by Talbi, the reader is encouraged to refer to [Tal02]. A comprehensive and updated review of Hybrid EAs proposed in recent years, although structured in a different way, can be found in [GA07].

3.2.1

Hybrid Algorithms with Relay Behavior

Tseng and Liang [TL06] proposed a hybrid algorithm combining Ant Colony Optimization (ACO), Genetic Algorithms (GAs) and Local Search (LS). The algorithm is applied to solving the Quadratic Assignment Problem (QAP). The ACO is used to create an initial population for the GA. Additionally, alternative phases of ACO and GAs are executed. The pheromone values necessary for the ACO are also updated while the algorithm is in the GA phase to assure the correct behavior of the ACO. The Local Search improves solutions produced by both algorithms. For the same problem, Vázquez and Whitley [VW00] proposed a hybrid algorithm composed of ACO and Tabu Search. Ganesh and Punniyamoorthy [GP05] proposed a hybrid algorithm which combines Genetic Algorithms and Simulated Annealing (SA) for continuous-time aggregate production-planning problems. The authors state that the motivation for this combination is the power of GAs for global optimization whereas the SA carries out local optimization of the solutions found by the GAs. The algorithm is divided into two phases. In the first phase, the GA creates solutions by means of its recombination operators. In the second phase, every solution produced by the GA in the previous phase is passed to the SA to be improved. The set of improved solutions generated by the SA will be the population used by the GA in the next iteration.

3.2.2

Hybrid Algorithms with Teamwork Behavior

3.2.2.1 Hybrid Algorithms with Collaborative Behavior Grimaldi et al. proposed a hybrid algorithm combining GAs and PSO called Genetic Swarm Optimization [GGM+ 04] for solving an electromagnetic optimization problem. In this algorithm, both evolutionary techniques are highly coupled, as they work on the same population. At each generation, the population is divided into two parts, one for the GA and another one for the PSO. When both techniques have produced their offA FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


3.2. PREVIOUS WORK ON ADAPTIVE AND HYBRID EVOLUTIONARY ALGORITHMS

41

spring, the two generated subpopulations are joined together for the next iteration. A driving parameter, the Hybridization Coefficient (HC), expresses the percentage of the population that each generation is evolved by the GA. This value remains constant through all the execution of the algorithm. A HC value of zero means that the procedure is a pure PSO, whereas a value of one means that the procedure is a pure GA.

3.2.2.2 Hybrid Algorithms with Competitive and Adaptive Behavior Some approaches initially apply each operator with the same probability and, as the population evolves, evaluate the effect of each of these operators on the evolutionary process, adapting their participation consequently [AN04]. A huge amount of research has been done into this approach, with a remarkable increase in performance compared to a traditional Genetic Algorithm with single crossover and mutation operators. Herrera proposed a fuzzy logic controller to adjust the participation of different crossover operators [HL96]. Other authors propose a hybrid algorithm in which the participation of several crossover operators is adjusted based on the progress introduced into the population by using each of them [HWLL02, AATU03]. Similar strategies were proposed by Julstrom in [Jul95, Jul97], where the participation of each crossover operator depends on the increase in fitness introduced by the operator in the last iteration or on the average fitness of the individuals produced by each of them [HW98, HWC00]. Thierens, in [Thi05], proposed an algorithm where three different vectors are maintained through its execution: a P vector, with the probability of each operator to be applied; a R vector, with the rewards obtained by each operator; and a Q vector with the expected quality of each of the operators, which is computed from the reward vector and the previously observed qualities. The probability of applying each operator is updated depending on their quality rather than directly on their reward. The authors conclude that using the quality vector instead of the raw reward vector increases the performance of the algorithm. An important issue discussed in this paper is the need of minimum and maximum participation ratios for the correct work of the algorithm. In [WPS06], Whitacre et al. propose a credit assignment algorithm with multiple crossover operators in which each operator receives a credit computed from two different criteria: (1) the fitness of the offspring produced by each operator, and (2) the search bias and historical linkage of individuals produced in a later stage with ancestors that were created with a particular operator. A similar approach is followed by Barbosa and Madeiros in [BeS00], where the participation of each operator is adjusted not only considering which operator created each individual in the current population but also the operators used to create the ancestors of these individuals.

3.2.2.3 Hybrid Algorithms with Competitive and Self-Adaptive Behavior Much work has been carried out on self-adaptive competitive hybrid algorithms. Some authors propose algorithms in which the information regarding the recombination operator used is stored directly as a part of the encoding of the individual [Spe95]. Other authors, like Derigs [DKZ97], proposed a Genetic Algorithm in which each individual encodes, Antonio LaTorre de la Fuente


42


along with the chromosome information, what they call the “environment” of an individual, which is the tuple (crossover operator, mutation operator, selection operator). These tuples are initially uniformly distributed through the individuals in the population. During the execution, the algorithm maintains a scoreboard with information about how successful each of the available operators has been. At each generation, two parents (master and slave) are selected for reproduction, and the operators encoded in the master individual are applied, generating two offspring individuals that are randomly assigned as master and slave offspring individuals, respectively. The master parent is randomly chosen, whereas the slave parent is selected by applying the selection operator encoded in the environment of the master parent. If the master parent is replaced by its offspring, the scoreboard is increased for the operators of the master parent. If the slave parent is replaced by its offspring, the operators of the slave parent are decreased in the scoreboard, as the reproduction was not successful. When such a replacement takes place, the differences in the values of the scoreboard are computed for both the operators of the master and the slave parents. If these differences are greater than a threshold τ , then the operators of the slave parent are overwritten by those of the master parent.

3.2.2.4 Hybrid Algorithms with a Shared Population All the hybrid approaches discussed in previous sections use a common population for the algorithms being combined. Another example of this kind of hybridization is the GA-EDA algorithm proposed by Robles et al. [RPL+ 04] in which a GA and an EDA are combined and applied to the resolution of both discrete and continuous problems. Several strategies to adjust the overall participation of the algorithms were tested, including both static and dynamic approaches. For example, the average fitness of the offspring populations of both algorithms is computed at each generation. The number of individuals to produce in the next generation is updated according to this value, increasing the participation of the best algorithm and decreasing that of the other one.

3.2.2.5 Hybrid Algorithms with Private Populations Some authors propose island GAs in which each island evolves a population by means of recombination operators with different characteristics, trying to take advantage of exploration and exploitation mechanisms by controlling migratory processes [ES98, TMSI03]. Another example is the hybrid EA-PSO proposed by Shi et al. in [SLL+ 05], in which both subsystems are executed in parallel, with the exchange of a few individuals at each generation.

3.2.2.6 Heterogeneous Hybrid Algorithms with Different Algorithms One of the most successful and studied hybridizations of Evolutionary Algorithms is the combination of EAs with LS techniques. One example of this kind of algorithms are Memetic Algorithms, introduced by [Mos89] and formalized by [RS94], which extend EAs by applying a Local Search to individuals after mutation. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


3.2. PREVIOUS WORK ON ADAPTIVE AND HYBRID EVOLUTIONARY ALGORITHMS

43

Local Search methods change several values in the candidate solution in order to improve its fitness. Once the changes have been made, there are two options to include the individual back in the population. First, the so-called Baldwin effect, in which the original individual is returned but maintaining the improved fitness obtained by the local search procedure (which represents its survival potential). The other alternative is the Lamarckian evolution, in which the modified individual is inserted into the population (Lamarck’s theory states that the characteristics an individual acquires during its life are passed to the offspring). This second approach is less bio-inspired as the genotype of the individuals changes once it has been created. Some studies have investigated whether strategies based on the Baldwin mechanism are better or worse than ones implementing the Lamarckian mechanism to accelerate the search. The results obtained vary and to a large extent depend on the problem. In his complete analysis, Merz [MF97] reported the benefits of using Lamarckian evolution. In [WZZ06], a Hybrid Genetic Algorithm is proposed for the Flow-Shop Scheduling (FSS) in which multiple genetic operators are applied simultaneously to carry out a global search, whereas a LS procedure based on a neighborhood structure-based graph model refines the solutions. The use of mutation and Local Search is controlled by a decision probability which tries to maintain the diversity in the population, whereas the computational effort focuses on exploiting promising solutions. Menon et al. [MBP05] used two hybrid techniques combining Differential Evolution and Local Search for the clearance of nonlinear flight control laws. In [MHMG05], the authors proposed the combination of an EA, a clustering process and a LS procedure for the evolutionary design of neural networks. The LS procedure is incorporated to the EA in order to improve its performance. However, due to the population size used and/or the dimension of the search space, the LS procedure is not applied to every solution generated by the EA. Instead, a subpopulation of the best individuals so far is selected, a clustering process is carried out on this subpopulation to group similar individuals and the LS procedure is only applied to the best individual of each group.

3.2.2.7 Heterogeneous Hybrid Algorithms with Different Operators In [HKM95], Hong et al. proposed different strategies to adapt the participation of several crossover operators: (a) if both parents have been created by the same crossover operator, then the children will be generated by that operator. If not, a crossover operator is uniformly selected from among the available ones, (b) the opposite to the previous one, and (c) selecting the operator which has generated fewer solutions in the current iteration.

3.2.2.8 Heterogeneous Hybrid Algorithms with Different Encodings Finally, we should note another important factor that should be properly configured for the successful application of an EA: the encoding of the solutions. The effect of coding on the heuristic search mechanisms is very important due to the relationship between the encoding criteria and the fitness landscape. The fitness landscape is a representation of the whole solution space assigning fitness values to each point of this space. Antonio LaTorre de la Fuente


44


Studies carried out by Jones and Forrest [JF95] on the difficulty of different optimization problems have measured one of the aspects of problem complexity by correlating the difference between fitness function values and the Euclidean distance in the solution space. In this sense, for example, a pure binary encoding landscape differs from a Gray code landscape for the same fitness function. This could make the problem easier or more difficult to be solved as mentioned in [CS88]. Similar studies into the effect of different codings for a common problem have been published. In [Sal96], the author compares several encodings for function optimization, detailing the advantages and disadvantages of each encoding scheme. A comparison of genetic codings for the Traveling Salesman Problem was carried out in [TKS+ 94]. In [CP95], the authors establish the relationship between the coding function used to represent individuals in a GA and the building blocks hypothesis. In this work, a method which finds a coding function to build meaningful building blocks is proposed, and the experimental results show the improvements of selecting an appropriate encoding for a particular problem. In the field of multiobjective optimization, Kormaz et al. in [KDAB06] compared several different encodings on a clustering problem and conclude that the use of different representations can deeply influence the performance of the GA. Despite all these studies, there have not been many attempts at taking advantage of the characteristics introduced by this solution space translation. Variable Neighborhood Search (VNS), introduced by Mladenovic and Hansen in [MH97], has been used as a metaheuristic for solving combinatorial and global optimization problems. The basic idea is the systematic change of neighborhood within a Local Search. VNS has been used in conjunction with other heuristic methods, such as Tabu Search or GRASP [FR95]. In [SY00] Schnier and Yao proposed a Genetic Algorithm to solve different optimization functions in which the individuals could be encoded with a Cartesian or a pseudo-polar representation and combined by using either of them.

3.3

Limitations of Previous Approaches

In Table 3.2, a summary of the work reviewed in the previous section is presented. For each reviewed paper, the characteristics from the taxonomy proposed in Figure 3.2 have been identified in order to offer a complete comparison of the different strategies present in the literature regarding the hybridization of Evolutionary Algorithms. Most of the work gathered in Table 3.2 describes ad-hoc hybrid algorithms specifically designed for solving a particular problem or class of problems. However, the application of these approaches to a different problem than that they were designed for could involve important modifications of the hybrid algorithm and, probably, would not deliver the expected results. The aim of MOS is to provide a common framework for the hybridization of EAs with little effort. For this purpose, the reproductive mechanisms of each of the evolutionary approaches that may be combined should be uncoupled from the algorithm they belong to. This abstraction allows the design of flexible Hybrid EvoA FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


3.3. LIMITATIONS OF PREVIOUS APPROACHES

45

lutionary Algorithms where the work of specifying the mechanisms for combining the different algorithms is already done and user’s effort should only be put in selecting the set of algorithms to be combined, the control mechanisms to adjust the participation of each algorithm, etc. As it has been said before, MOS has been conceived for the design of high-level and teamwork hybrid algorithms. However, the abstraction depicted in this section, that will be thoroughly explained in Chapter 4, makes it really easy to design similar algorithms with different configurations (low-level, relay, etc.).



CHAPTER 3. ADAPTATION AND HYBRIDIZATION IN EVOLUTIONARY COMPUTATION 46

Reference [TL06] [VW00] [GP05] [GGM+ 04] [YM02] [HKM95] [AN04] [HL96] [HWLL02] [AATU03] [Jul95] [Jul97] [HW98] [HWC00] [Thi05] [WPS06] [BeS00] [Spe95] [DKZ97] [RPL+ 04] [ESKT98] [TMSI03] [SLL+ 05] [WZZ06] [MBP05] [MHMG05] [MH97] [FR95] [SY00]

" " " " " " " " " " " " " " " " " " " "

Collaboration Relay Teamwork

" " "

" " "

" " "

" " " " " " " " " " " " " " " " " " " "

" " " " " "

" " "

Characteristics Search Population Global Partial Shared Private

" " " " " " " " " " " " " " " " " " " " " " " " " " " " "

" " " " " "

Specialization General Specialist

" " " " " " " " " " " " " " " " " " " " " " " " " " "

Table 3.2: Summary of previous work on Hybrid Evolutionary Algorithms

Heterogeneity Homogeneous Heterogeneous Algorithm Algorithm Algorithm Algorithm Operator Operator Operator Operator Operator Operator Operator Operator Operator Operator Operator Operator Operator Operator Operator Algorithm Operator Operator Operator Algorithm / Operator Algorithm Algorithm Encoding Algorithm / Encoding Encoding

" " " " " "

Collaboration Competitive Collaborative

Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Adaptive Self-Adaptive Self-Adaptive Adaptive

" " " " " " " " "



Part III

PROBLEM STATEMENT AND SOLUTION

Chapter 4


4.1

Introduction to Multiple Offspring Sampling

Evolutionary Algorithms (EAs) are population-based metaheuristics that cover a wide range of algorithms, as well as many variations of the canonical ones. Each of these algorithms (and variations) accepts several parameters that have to be carefully selected in order to achieve satisfactory results. This flexibility turns EAs into powerful optimization techniques. However, this characteristic, that makes EAs suitable for solving even the hardest optimization problems with a remarkable success, is, paradoxically, one of the most important matters that somebody interested in using EAs in his research has to deal with. An appropriate selection of a single algorithm and its associated parameters for a particular optimization problem is a difficult issue (Grefenstette, in [Gre86], states that this task sometimes becomes an optimization problem itself). The No Free Lunch Theorem [WM97] holds that "any two algorithms are equivalent when their performance is averaged across all possible problems". This means that it is impossible to define a general strategy that outperforms any other algorithm for every possible problem. This, of course, applies for Evolutionary Algorithms. Even if an EA has been proved to be successful on a similar problem, this does not guarantee that this success will be repeated. A slight variation of the conditions or the data used in the experimentation could lead to unpredictable results, much of the times not as satisfactory as expected. Additionally, as some of the work reviewed in the previous chapter suggests [CS88, MH97, SY00, Thi05, WPS06, LKC+ 07], the hybridization of different EAs, encodings or operators can significantly boost the performance of the hybrid approach. This opens new alternatives to improve the performance by the combination of different evolutionary approaches and means that now, not only every single alternative should be considered, but also all the combinations of any of them. Antonio LaTorre de la Fuente


50

CHAPTER 4. MULTIPLE OFFSPRING SAMPLING

Much work has been done on the selection of the parameters of EAs and the combination of different approaches, as the review presented in the previous chapter shows. The conclusion inferred from this review is that no general solution to this problem actually exist, as most studies are focused on particular parameters and limited to just a few problems. An alternative to deal with multiple algorithms, keeping the best performance (or close to it), is the combination of these algorithms in a hybrid way. This hybridization can lead to the following two situations: • A collaborative synergy emerges among the different algorithms that improves the performance of the best one when it is used individually. • A competitive selection of the best one takes place, in which a similar performance (often the same) is obtained with a minimum overhead. In this work, Multiple Offspring Sampling (MOS) is introduced as a general framework for the development of Dynamic Hybrid Evolutionary Algorithms. MOS provides the functional formalization necessary to design the aforementioned algorithms, as well as the tools to identify and select the best performing hybrid configuration for the problem being solved. Next sections review the operation schema of an EA to identify the differences among traditional EAs and MOS, will give a detailed description of MOS functioning and will introduce the functional formalization of the general MOS algorithm.

4.1.1

Functional Formalization of an Evolutionary Algorithm

A deep discussion on how Evolutionary Algorithms work has been presented in Chapter 2 for each of the most relevant dialects of this kind of algorithms. In order to properly introduce the contributions of this work, a functional formalization for general EAs must be given. Generally, the operation of these algorithms can be divided into different phases: À Creation of the initial population P0 . Á Evaluation of the initial population P0 . Â Checking for the algorithm termination (convergence or number of generations). If so, then finish; continue otherwise. Ã Generation, using some individuals from Pi , of a set of new individuals for the next generation, called the offspring population Oi . Ä Evaluation of the new individuals in Oi . Å Combination of the offspring and the previous populations to define the population for the next generation, Pi+1 . A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


4.1. INTRODUCTION TO MULTIPLE OFFSPRING SAMPLING

51

Æ Go back to Â. Based on this schema, different Evolutionary Algorithms have been developed. For example, in step Å generational GAs take the offspring as the next population (Pi+1 = Oi ). Other approaches, such as Steady State Algorithms, generate only one offspring individual that replaces the worst individual in Pi ; and intermediate approaches, based on elitism, take the best individuals from both Oi and Pi to generate the population for the next generation Pi+1 . For step Ã, the literature also offers a wide variety of approaches, such as selecting different genetic operators. Other evolutionary techniques, such as those reviewed in Chapter 2, or even other metaheuristics could be used for the same purpose.

4.1.1.1 Genotypes and Phenotypes In the context of Evolutionary Computation, for the description of one problem, two different sets of elements should be considered: r S is the set of possible phenotypes (candidate solutions to the problem). r C is the set of all the possible combinations of values within the selected encoding format (genotypes). This set defines the search space for the Evolutionary Algorithm. It should be taken into account that, in the general schema mentioned above, different operations are carried out on different sets of elements. For example, the evaluation of the candidate solutions is a phenotypical operation: the individual is the one that behaves well or badly in the environment and not the selected representation. On the other hand, the recombination of individuals to generate the offspring is based on the genotype encoding. More details of the use of multiple encodings are given in Section 4.1.3. For an EA there must also exist a decoding function code that transforms elements from the genotype set into elements in the phenotype set (encodings into solutions), as shown in Equation 4.1. code

C −−−−→ S

(4.1)

c −−−−→ s This function can be extended to operate on a set of encodings instead of on single individuals. The function code generates a set of solutions (S ⊂ S) from a set of genotypes (C ⊂ C):

code

P(C) −−−−→ P(S) C −−−−→ S code(C) = {s ∈ S / ∃ c ∈ C : s = code(c)}

(4.2) (4.3)

The phenotype and genotype pair, (s, c) ∈ S × C ∧ s = code(c), identifies both the individual as a solution to the problem and the encoding used for this solution. Antonio LaTorre de la Fuente


52


4.1.1.2 Fitness Function To drive the search mechanism, Evolutionary Algorithms require the existence of an evaluation function that determines the chances of the individual to survive in the environment, i.e., a fitness function fit: fit

S −−−−→

R

(4.4)

s −−−−→ fit(s) This approach could be, in some contexts, rather restrictive, as some methods, especially co-evolutionary algorithms, define order relations to compare the quality of the phenotypes. In this work, however, only population-independent fitness functions have been considered.

4.1.1.3 Offspring Sampling Function Let off be the Offspring Sampling Function. This function defines how new individuals are generated by the recombination of the individuals in previous generations. This is a genotype-level function. The Offspring Sampling Function in GAs is defined as the combination of genetic operators (crossover, mutation and selection). In other approaches, such as EDAs, this function comprises a statistical model and a sampling function to infer new individuals from this model.

off

P(C) −−−−→ P(C) Ci −−−−→ off (Ci ) Offspring size constraint: ∀i : |off (Ci )| = Πi

(4.5) (4.6)

In Equation 4.6, Πi represents the size of the new offspring population, whose value usually remains the same between generations. Finally, a method to combine previous and current populations should be introduced, resulting in the population combination function comb:

comb

P(S) × P(S) −−−−→ P(S)

(4.7)

(Si , Oi ) −−−−→ Si+1 Previous population: ∃ Ci ⊂ C, Si ⊂ S / Si = code(Ci ) (4.8) Offspring population: Oi ⊂ S / Oi = code(off (Ci )) Many different functions for the combination of populations could be used as, for example, the classic elitist function, defined as comb(Si , Oi ) = Si+1 : Si+1 = {s ∈ Si ∪ Oi / @ t ∈ Si ∪ Oi : t ∈ / Si+1 ∧ fit(t) fit(s)}

(4.9)

In Equation 4.9, represents better-fitness-than, which is “greater than” or “less than” depending on the sense of optimization, maximization or minimization, respectively. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



4.1.2

53

Description of Multiple Offspring Sampling

The Multiple Offspring Sampling model shares many of the characteristics of traditional EAs: it is an iterative stochastic population-based optimization algorithm that defines a fitness function used to evaluate the adaptation of each individual in the population and where the fittest individuals have more chances to survive and pass to the next generation. However, there exist some differences between both groups of algorithms. In MOS, a key term is the concept of technique, which is a mechanism, decoupled from the main algorithm, to generate new candidate solutions. This means that, within a MOS algorithm, many reproductive mechanisms are available. Some examples of different techniques could be: • Same EA with same operators and different parameter values. • Same EA with different operators and different parameter values. • Different EA with different (specific) parameters. • Etc. A more concrete definition for these reproductive mechanisms or techniques in the context of MOS is, thus, needed. The following definition arises: Definition 1. A MOS reproductive technique is a mechanism to create new individuals where: (a) a particular evolutionary algorithm model, (b) an appropriate solution encoding, (c) specific operators (if required), and (d) necessary parameters have been defined. From the definition above, the concept of a MOS system can be introduced: Definition 2. The tuple (n, T , P, O) defines a MOS system, where n is the number of techniques in the tech(j)

niques set T = {Ti }; P = {Pi } is the m-size set of common populations per generation and O = {Oi } is the n × m set of offspring populations per technique j and generation i.

4.1.3

Multiple Encodings

As it was seen in Section 3.1.1, one of the parts of an Evolutionary Algorithm subject to adaptation and/or hybridization is the representation used for the solutions (individuals) within the EA. This section provides a functional formalization that joins the use of multiple encodings in the same algorithm with the functional formalization for EAs presented in Section 4.1.1. This particular hybridization capability of the MOS framework has been considered as it is one of the most complex hybridization scenarios and thus it shows the flexibility and possibilities of MOS. In an Evolutionary Algorithm, a solution to a problem (phenotype) can be encoded using different representations (genotypes). A solution space can be defined as the pair of values: Antonio LaTorre de la Fuente


54


Ω ⊂ (C, R) / (c, v) ∈ Ω : v = fit(code(c)) One genotype encoding can be transformed into another, in such a way that a point from one solution space (and fitness landscape) is translated into a point (or points) on another solution space. To generalize, it could be considered that different offspring mechanisms use also different genotype encodings. So, let C(j) be the encoding space produced by the mechanism j. As different genotype formats are allowed, there must also be different coding functions (for both, a genotype code code(j) and a set of codes code(j) ):

code(j)

C(j) −−−−→ S −−−−→ s

c P(C

(4.10)

(j)

code(j)

) −−−−→ P(S)

(4.11)

C −−−−→ S code(j) (C) = {s ∈ S / ∃ c ∈ C : s = code(j) (c)}

(4.12)

In a Hybrid Evolutionary Algorithm, a solution (the phenotype) would have the possibility to participate in multiple genotype recombination mechanisms. If the different mechanisms use also different genotype formats, then, once an individual is created (and evaluated), it could be coded back to take part in different possible genotype formats (and their operators). To manage these transformations, a group of functions is required to transform genotypes between two different encodings (transi,j ). These functions allow the possibility of one individual in a particular encoding being translated into multiple individuals in a different encoding, as it can be seen in Equations 4.13 and 4.14. transi,j

C(i) −−−−→ P(C(j) )    (i)  (j) ycode ycode S

(4.13)

S (j)

(j)

Unique phenotype encoding : C (j) = transi,j (c(i) ) = {c1 , c2 , . . . , c(j) n }

(4.14)

(j)

∀i, j, k code(i) (c(i) ) = code(j) (ck ) At this point, the convertibility property of a solution space into another solution space can be introduced. However, Definition 3 should be first given in order to be able to prove Theorem 1. Definition 3. Two genotypes from the same encoding class are said to be equivalent if they encode the same solution: c ∼ c0

iff code(c) = code(c0 )




55

Now, Theorem 1 can be formulated. Theorem 1: Two solution spaces Ωi and Ωj are convertible if ∃ transi,j : transi,j (ci ) ⊂ C (j) satisfies the following property:

∀c0i / ci ∼ c0i =⇒∀cj ∈ transi,j (ci ) ∀c0j ∈ transi,j (c0i ) cj ∼ c0j

P ROOF Let transi,j : transi,j (ci ) ⊂ C (j) be a transformation function from the i − th encoding to the j − th encoding. If ci ∼ c0i then, from Definition 3, code(i) (ci ) = code(i) (c0i )

(4.15)

The definition of transi,j in Equations 4.13 and 4.14 guarantees that

code(i) (ci ) = code(j) (transi,j (ci ))

(4.16)

code(i) (ci ) = code(j) (cj ) and

code(i) (c0i ) = code(j) (transi,j (c0i ))

(4.17)

code(i) (c0i ) = code(j) (c0j ) From Equations 4.15, 4.16 and 4.17 code(j) (cj ) = code(j) (c0j ) which is necessary and sufficient condition for cj ∼ c0j



56


The definition of an individual, in the case of hybrid algorithms with multiple encodings, should include information from the phenotype, but also from all the genotype encodings. The individual identification in MOS is (s, c(1) , c(2) , · · · , c(n) ) ∈ S×C(1) ×C(2) ×· · ·×C(n) . This tuple should also validate ∀j : s = code(j) (c(j) ) as well as the Unique Phenotype Encoding constraint (Equations 4.13 and 4.14, mentioned above). Additionally, the previous formalism should be extended to include different offspring sampling functions: off (j)

P(C(j) ) −−−−→ P(C(j) ) Ci

(4.18)

−−−−→ off (j) (Ci )

In the case of single offspring functions there is a constraint on the size of the offspring production |off (Ci )| = Πi . In the case of multiple offspring functions, this constraint could change dynamically to balance the offspring sampling according to the strategy defined by the algorithm. By this feature, MOS can select among different offspring generation alternatives in a generation-by-generation way. Section 4.2.1.1 presents how this sampling sizes should be defined. In MOS, the population merge function comb? should also be defined, in order to combine multiple offspring populations with the population from the previous generation:

comb?

(P(S))n+1 (1)

(2)

−−−−→

P(S) (1)

(n)

(2)

(4.19)

(n)

(Si , Oi , Oi , · · · , Oi ) −−−−→ comb? (Si , Oi , Oi , · · · , Oi ) = Si+1 Previous population: Si ⊂ S/Si = code(Ci ) (j)

Offspring population: Oi

4.2

(j)

⊂ S/Oi

(4.20)

= code(off (j) (Ci ))

The Multiple Offspring Sampling Algorithm

As it has been seen in the previous section, a MOS algorithm modifies traditional Evolutionary Algorithms adding the capability of using several different recombination techniques simultaneously. A functional formalization of an Evolutionary Algorithm has also been presented. From this formalization, the main differences between MOS and traditional EAs can be established. The first one affects to step Ã in the general schema outlined at the beginning of the previous section and Equation 4.6. MOS proposes the definition of multiple (j)

mechanisms to generate new individuals. Each recombination technique creates its own offspring Oi , where i is the generation and j is the used mechanism. In MOS, all these techniques compete during the evolutionary search process. In this “competition" two main approaches can be defined to adjust the participation of each recombination method: • A central approach, in which a quality function evaluates how well these methods are performing and a participation function assigns to each of them a different number of offspring individuals to produce. Usually, the participation function computes the number of new individuals for each technique from A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


4.2. THE MULTIPLE OFFSPRING SAMPLING ALGORITHM

57

its quality value (that can be measured in many different ways), but some static, alternating or a priori programmed functions can also be considered. This regulatory model will be explained in Section 4.2.1. • A self-adaptive approach, in which the participation of each of the recombination methods is encoded within the individuals and it is updated every mate of two individuals. This mechanism for encoding the participation ratio constrains the application of this approach to techniques using a crossover operator (for example, an EDA could not be used). This alternative will be described in detail in Section 4.2.2. The second important difference modifies the way step Å of the general schema at the beginning of Section 4.1.1 is carried out (formalized in Equation 4.8). At this point, the previous population Pi and all the offspring (j)

populations Oi

(instead of a single offspring population) are merged to produce the population for the next

generation, Pi+1 . This process, as said before, is usually done by using an elitist population merge function. Figure 4.1 depicts a diagram in which this modifications can be observed. Creation of Initial Population P0

Evaluation of Initial Population P0

Check for Algorithm Termination

Recombination of Individuals in Pi with Technique Tj

Recombination Technique #j

Evaluation of Individuals in Oij

Merge of Individuals in Oij with previous Population Pi

Figure 4.1: General schema of a MOS algorithm

4.2.1

Central Approach

This is the first of the two strategies defined for the adjustment of the participation of each reproductive technique in a MOS algorithm. In this approach, it is an external function which decides the amount of individuals to be generated by each technique. Before the offspring step at each generation, the algorithm recalculates the participation ratio for each of these techniques. Many different Participation Functions (PFs) could be defined. As it will be seen in detail in Section 4.2.1.1, the PFs can be divided into two main groups: Deterministic and Dynamic Participation Functions. Antonio LaTorre de la Fuente


58


Creation of Initial Population P0



Evaluation of the Quality of Technique Tj

Assignment of Participation to Technique Tj

Recombination of Individuals in Pi with Technique Tj

Recombination Technique #j


Merge of Individuals in Oij with previous Population Pi

Figure 4.2: General schema of a MOS algorithm with Central approach

The first group is made up of functions for which the participation of each technique does not consider the population, individual or evolutionary behavior of the algorithm. Some examples of this type of functions, that will be described in Section 4.2.1.1, are the Constant PF, the Alternating PF or the Incremental PF. In the second group, the Dynamic Participation Functions can be found. In these functions, the amount of individuals to be produced by each technique is adjusted at every generation. The new participation ratios are computed by means of a Quality Measure that tries to estimate the benefits of using one particular reproductive technique. More details on this type of PF will be given in Section 4.2.1.1. The Algorithm 4 presents a pseudo-code of MOS describing the general functioning of this hybrid approach. The same information is shown in Figure 4.2, which represents the general schema of a MOS algorithm with central approach. This Figure highlights the two new steps which have been added to the general schema of MOS: the evaluation of the quality of a reproductive technique (optional, only if using the Dynamic Participation Function) and the recalculation of the participation ratio for each of the techniques. Next subsections will give a detailed view of each of these steps and the elements needed to carry out these actions.

4.2.1.1 Participation Functions A Participation Function (PF) establishes the amount of individuals (participation ratio) that a particular reproductive technique can produce in a generational basis. Two main groups of PFs can be defined, DeterminA FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



59

Algorithm 4 MOS Algorithm with Central Approach Create initial overall population of candidate solutions P0 (j) Uniformly distribute participation among the n used techniques → ∀j Π0 = (j) produces a subset of individuals according to its participation (Π0 ) 3: Evaluate initial population P0 4: while termination criterion not reached do (j) (j) 5: Update Quality of Tj → Qi = Q(Oi−1 ), ∀j 1:

2:

6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

(j)

|P0 | n .

Each technique

(j)

Update participation ratios from Quality values computed in Step 5 → ∀j Πi+1 = P F (Qi ) for every available technique Tj do (j) while ratio Πi not exceeded do Create new individuals from current population Pi using technique Tj Evaluate new individuals (j) Add new individuals to an auxiliary offspring population Oi end while end for (j) Combine populations Oi ∀j and Pi according to a pre-established criterion to generate Pi+1 end while

istic and Dynamic PFs, depending on the a priori knowledge on the participation ratio of each sampling method in every generation: if the participation ratio of the j − th technique can be computed before the algorithm is executed then the PF is deterministic, whereas if this participation can not be a priori determined (if, for example, it depends on the evolved individuals), then the PF is dynamic. Deterministic Participation Functions map the i − th generation of the evolutionary process and the j − th available technique with a value that represents the participation of the j −th technique at the i−th generation. This function is defined as shown in Equation 4.21. PF

N × N −−−−→

N (j)

(4.21)

(i, j) −−−−→ Πi

The Dynamic Participation Function maps the offspring population of the j − th technique at the i − th generation with the participation ratio of that technique. For this purpose, it makes use of a Quality Measure that evaluates how good this population is and thus how many individuals technique j should create. This function is defined as shown in Equation 4.22. PF

P(S) −−−−→ (j)

Oi

N (j)

(4.22)

−−−−→ Πi

In this section, several examples of Deterministic Participation Functions and the quality-based Dynamic Participation Function will be reviewed.

Constant Participation Function The percentage of individuals produced by each of the reproductive techniques is constant. This ratio can be either defined by the user or equally distributed among the available techniques. This is the simplest scenario that can be defined and, obviously, not the best performing one. Antonio LaTorre de la Fuente


60


Alternating Participation Function

This Participation Function assigns the whole production of individuals

alternatively to each of the available sampling methods. This way, the search properties of each of the defined mechanisms are preserved along all the evolutionary process. The drawback is that, if one of the reproductive techniques performs very poorly compared to the other techniques, precious computational time will be wasted generating individuals that will be, with a high probability, discarded by the algorithm. First, an assignment function that maps techniques and generations where they cope the production of new individuals in the algorithm should be defined (Equation 4.23). assign

N −−−−→ N

(4.23)

i −−−−→ j Then, the Alternating Participation Function can be defined in the same way as the general Participation (j)

Function introduced in Equation 4.21. Given an overall population size |P0 |, the Πi

values are computed as

defined in Equation 4.24. ( (j) Πi

=

|P0 | if assign(i) = j 0 otherwise

∀j ∈ [1, n]

(4.24)

In all the previous equations j represents the j − th sampling method and i the current generation for the MOS algorithm.

Incremental Participation Function In this PF, the ratio for one of the techniques is zero at the beginning of the algorithm. The remaining techniques will start producing the whole offspring population according to their uniformly distributed participation ratio. A mid-point value M can also be defined to determine the point at which the first sampling method will get the fifty percent of the participation. The incremental participation function can be defined in the same way as the general participation function (j)

introduced in Equation 4.21. In this case, the Πi

(j)

Πi

values are computed as defined in Equation 4.25.

k j i  if j = 0  M +i · |P0 | j k =  1− Mi+i  · |P0 | otherwise n−1

∀j ∈ [1, n]

(4.25)

In all the previous equations j represents the j − th sampling method and i the current generation for the MOS algorithm.

Programmed Participation Function

This Participation Function is similar to the Constant PF. The only

difference is that, instead of fixing the participation for the different reproductive techniques at the beginning of the algorithm and keeping them constant during the whole execution, the probabilities are fixed by intervals. Fixed participation ratios can be, thus, assigned until generation n1 , then changed to other values until generation n2 , then changed again, etc. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



61

This PF offers high possibilities if the knowledge about the performance of the different sampling methods is well known at the configuration time of the algorithm, as it eases the exploitation of the benefits of the exploratory characteristics of each different technique in the appropriate instant of the evolutionary process. The main disadvantage of this Participation Function is that it is very difficult to determine the exact generations in which the changes in the participation ratios must be applied. To overcome this inconvenience, this PF can be used in conjunction with the Dynamic Participation Function (described in detail in the following paragraphs). The latter can be used to determine the different exploratory phases of the MOS algorithm and, then, this PF could be used to tune more efficient participation programs, based on the information gathered by the Dynamic PF.

Dynamic Participation Function The Dynamic Participation Function evaluates the quality of the offspring populations generated by each sampling method at each generation and defines the sampling sizes for the next one. The definition of the Dynamic Participation Function was introduced in Equation 4.22. To measure the quality of a search mechanism, different strategies can be followed, depending on the characteristic to be emphasized. This will be discussed in detail in Section 4.2.1.2. As for now, the Quality Function (QF) can be seen as a function that, given an offspring subpopulation, returns a value that measures how good this sampling method is for a particular criterion (Equation 4.26).

Q

P(S) −−−−→ (j) Oi−1

R

(4.26)

(j)

−−−−→ Qi

(best)

From this quality values, the best performing technique (Tbest ) and its associated quality value (Qi can be identified by computing best as

)

(j) argmax(Qi ). j

The Dynamic Participation Function can now be introduced as depicted in Equation 4.27. This PF com(j)

putes, at each generation, a trade-off factor for each technique, ∆i , that represents the decrease in participation for the j − th technique at the i − th generation, for every technique except the best performing one. This (j)

technique will increase its participation by the sum of all those ∆i .

(

(j) PFdyn (Oi−1 )

(j)

Πi−1 + η (j) (j) Πi−1 − ∆i X (k) η= ∆i

=

if j = best, otherwise

(4.27)

k6=best (j)

Two different strategies for computing the aforementioned ∆i factors have been proposed. Both strategies (j)

are represented by Equation 4.28, in which the ∆i

factor is computed from the relative difference between

the quality of the best and the j − th offspring populations, n being the number of available techniques. However, these two strategies consider different values for the reduction factor ξ, which represents the ratio Antonio LaTorre de la Fuente


62


that is transferred from one technique to the other, and present different characteristics which will be detailed in the following paragraphs. (best)

(j)

∆i

=ξ·

(j)

Q(Oi−1 ) − Q(Oi−1 ) (best) Q(Oi−1 )

(j)

· Πi−1

∀j ∈ [1, n] / j 6= best

(4.28)

The first strategy keeps the overall population size fixed through all the process. A minimum participation ratio can be established, but no maximum participation ratio is considered (apart from the implicitly maximum ratio 1 − ((n − 1) · ratiomin ) when using n techniques and the minimum participation ratio has been fixed to ratiomin ). The reduction factor ξ is usually set to a value of 0.05. On the other hand, the second strategy allows the overall population size to change from one generation to the next. This means that an initial population size |P0 | must be defined, and that the current population size can be decreased or increased through the evolutionary process, but it can never exceed its initial value. Additionally, a minimum participation ratio per technique is also allowed, as in the previous case, but a maximum (j)

participation ratio is now mandatory, usually Π0

=

|P0 | n ,

n being the number of available techniques. The

reduction factor ξ is now fixed to a value of 1.00. As it can be seen, both adjustment strategies depend on the scale of the values that the Quality Measure takes. For this reason, to guarantee a correct behavior of the Dynamic Participation Function, the values returned by the Quality Function must be normalized within the interval [0, 1]. Example Let the participation ratios of three given techniques (T1 , T2 and T3 ), before the current iteration (1)

(2)

(3)

i, be Πi−1 = 69, Πi−1 = 91 and Πi−1 = 40 and the reduction factor be ξ = 0.05. If the quality of an offspring population is computed as the average fitness of the top 25% of its individuals, the following scenario could be contemplated:

(1)

= 0.045

(2)

= 0.027

(3)

= 0.033

Qi Qi Qi

(4.29)

(2)

(3)

The best performing technique in this example would be T1 and, thus, the values for ∆i , ∆i (2) ∆i

+

(3) ∆i

and η =

should be computed.

If the first adjustment strategy is being used, these values are computed as follows: (1)

(2)

∆i

Qi

(2)

− Qi

(2)

· Πi (1) Qi 0.045 − 0.027 = 0.05 · · 91 0.045 = 0.020 · 91 = 1.82 ' 2 =ξ·


(4.30)



(1)

(3)

∆i

=ξ·

Qi

63

(3)

− Qi

(3)

· Πi

(1) Qi

0.045 − 0.033 · 40 0.045 = 0.013 · 40 = 0.52 ' 1 = 0.05 ·

(2)

η = ∆i

(3)

+ ∆i

=2+1=3

(4.31)

(4.32)

The participation of each technique for the next iteration would be then:

Πi

(1)

= 69 + 3 = 72

(2) Πi

= 91 − 2 = 89

(3) Πi

= 40 − 1 = 39

(4.33)

If the second adjustment strategy is being considered, and a maximum ratio of 100 individuals has been established, the adjustment values would computed as follows:

(1)

(2)

∆i

=ξ·

Qi

(2)

− Qi

(1) Qi

(2)

· Πi

0.045 − 0.027 · 91 0.045 = 0.4 · 91 = 36.4 ' 36

=1·

(1)

(3)

∆i

=ξ·

Qi

(3)

− Qi

(1) Qi

(3)

· Πi

0.045 − 0.033 · 40 0.045 = 0.267 · 40 = 10.7 ' 11 =1·

(2)

η = ∆i

(3)

+ ∆i

(4.34)

= 36 + 11 = 47

(4.35)

(4.36)

The participation of each technique for the next iteration would be then:

(1)

Πi

= 69 + 47 = 116 ' 100 (max) (2)

= 91 − 36 = 55

(3)

= 40 − 11 = 29

Πi Πi


(4.37)


64


4.2.1.2 Quality Functions A Quality Function (QF) tries to measure some valuable characteristic of a sampling method that is beneficial for the evolutionary search process and that will be used by the Dynamic Participation Function to recompute the participation ratio of each reproductive technique. In this context, different "beneficial characteristics" could be evaluated. For example, the average fitness of a percentage of the offspring population of a sampling method could be considered a good quality measure. The highest the average fitness for a given technique, the more participation it receives. Equation 4.38 exemplifies this approach.

(j)

(j)

Q(Oi ) = fAvg(Oi , ρ)

(4.38)

where ρ is the top percentage of the population to be considered to compute the average fitness value. However, not only the current performance of the available techniques can be used to measure their quality. Other approaches could relay on the potentiality of a technique or its search capabilities. For some problems, the diversity of the population managed by the EA is crucial for achieving a successful result. In this case, a technique capable of introducing this diversity, even if the generated individuals do not have the best fitness in the population, could be more interesting and thus its participation should be increased. This section will review some of the possible strategies that could be used as a quality measure of a reproductive technique.

Fitness Average This is probably the most obvious criterion to measure the quality of a sampling method. It can be found in its simplest form in Equation 4.38. This measure has been experimentally proved to offer a good performance even in complex optimization problems. It quickly detects and favors the most adapted technique for each problem and dataset, as in many cases the influence of the used dataset is a significant factor in this decision. Nevertheless, this effective detection of the most appropriate technique can lead to a premature extinction, in some cases, of a technique that, even though it could not be competitive in the first steps of the evolutionary process, it would be able to improve the overall performance if it received enough participation in a future stage. To overcome this problem, a minimum participation ratio can be fixed at the beginning of the algorithm to ensure that every technique is represented during all the execution. However, this minimum value has to be carefully selected so that computational resources are not wasted in unsuccessful reproductive techniques. Another interesting option to deal with this issue would be the establishment of several restart points through the execution of the algorithm in which the participation of all the techniques is reset to a preset value to give chances to techniques with a poor performance so far to regain prominence and contribute to the search process. Again, the selection of these reset points is a delicate issue that has to be thoroughly studied before being used.




Diversity

65

In some scenarios, the use of a sampling method capable of generating more diverse offspring

populations is more than desirable. This characteristic gives the algorithm the ability to explore a much wider search space. This is very useful when dealing with complex optimization problems, specially with deceptive or trap functions. Different approaches for measuring the diversity of a population have been proposed in several studies on, for example, how to avoid premature convergence in GAs [LGX97, JZJ00]. One way to measure diversity in EAs is by means of an Entropy measure. This measure is taken from Thermodynamics and Information Theory fields an adapted to be used in EAs. Within these algorithms, there is a clear entropy/diversity relationship between genotype and phenotype: individuals with the same genotype have the same phenotype. Thus, decreasing genotype diversity makes the phenotype diversity to decrease as well. For this reason, an entropy measure can be defined at the phenotypic level. Some studies [Ros95, LMB07] define this measure creating fitness histograms (individuals are divided into classes or bins according to their fitness) and applying the following formula:

Entropy =

n X

pi · log2 pi

(4.39)

i=0

Where pi is the number of individuals of the i − th class. To divide individuals into classes there are also several alternatives: • Linear division: A fitness interval is defined between the fitness of the best and the worst individuals. Then, this interval is evenly divided into n subintervals and each individual assigned to the subinterval its fitness belongs to. • Gaussian division: This partitioning strategy is derived from a Gaussian distribution. Different intervals are defined from the average fitness value of the population and its standard deviation. Boundaries for the first class are delimited by averagef itness ± σ, averagef itness ± 2σ for the contiguous classes, and so on. • Proportional Division: Each different fitness value determines a different partition. Individuals with the same fitness are included into the same partition, whereas individuals with different fitness define their own new partition. Other studies [LPDG06] state that this measure does not work with non-ordinal representations for combinatorial problems and they propose the use of distance and similarity measures among chromosomes as a better way to tackle diversity in Evolutionary Algorithms. Any of the above can be used in conjunction with the Dynamic Participation Function to guarantee that the reproductive technique with the highest diversity level is favored by the algorithm and produces a higher ratio of offspring individuals. Antonio LaTorre de la Fuente


66


Algorithmic Difficulty

In Evolutionary Computation, many different studies have been conducted to analyze

the difficulty of solving a given problem. Some general measures have been proposed in order to quantify this difficulty, such as the Fitness Distance Correlation (FDC) or the Negative Slope Coefficient (NSC). These measures try to determine how difficult a particular problem is for an EA from the point of view of the fitness landscape. A search process can be seen as a process of navigation on directed graphs whose vertices are labeled according to some function [JF95]. The underlying concept to FDC is analogous to classic heuristic search algorithms as, for example, A*, where the label of vertices is an estimate of a distance value. In this case, FDC tries to quantify the relationship between fitness and landscape by means of a correlation function between the fitness value of individuals and the closest global optimum. The best possible scenario would be a fitness function completely correlated with the distance measure used (considering a minimization problem), and it would mean that the problem is fully-easy for the algorithm. However, this perfect correlation is not very common and these values are often much lower (harder problems) and, sometimes, close to zero (fully-hard or highly-neutral problems); or even negative (deceptive problems). The Fitness Distance Correlation ratio can be computed as shown in Equations 4.40, 4.41 and 4.42.

τ=

CF D SF ∗ SD

(4.40)

where n

CF D =

1X (fi − f )(di − d) n i=1

(4.41)

and s SX =

Pn

i=1

(Xi − X)2 n

(4.42)

In the previous equations, CF D represents the covariance between fitness and distance, SX the standard deviation of a variable X, f the fitness of an individual and d the distance of that individual to the global optimum. When this global optimum is not a priori known, the Local FDC could be used instead [KS96], which considers the local optimum (that of the current population) for the computation of the correlation value. Another interesting measure of algorithmic difficulty is the Negative Slope Coefficient (NSC) [VTCV06]. This measure tries to determine the difficulty of a problem for a given EA establishing a comparison among the fitness values of the individuals in the current generation and those of the individuals in the parent population. When this comparison is carried out, the result is a fitness cloud representing the relationship among children and parents’ fitness values. To construct this fitness cloud, the following procedure must be followed. Given a population of individuals in a generation i, P = {γ1 , .., γn }, the objective function is applied to every individual in the population, f (γj ), to obtain its fitness value fj . In the population of the next generation, i + 1, for each individual in the previous A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



67

generation i we can find a set of children individuals G(γj ) = υj1 , ..., υjn . The same objective function is then applied to these individuals obtaining their {fj1 , ..., fjn } fitness values. For each individual in the original population we have that fj0 = maxk (fjk ) so that we can define a fitness cloud with all these relationships in the following way: C = {(f1 , f10 ), . . . , (fn , fn0 )}

(4.43)

A graphical representation of this fitness cloud gives an overview of the dynamic evolution of the population from one generation to the next, but the NSC measure provides a numeric value of much help in the comparison among different generations. Once the fitness cloud C is obtained, it can be partitioned into a subset of clouds C1 , .., Cm so that if (fa , fa0 ) ∈ Cr and (fb , fb0 ) ∈ Cs , with r < s, then fa < fb . Defining the average fitness value as:

ft =

1 |Ct |

X

f

(4.44)

(f,f 0 )∈Ct

the representative point of the fitness cloud becomes (ft , ft0 ). A polyline joining all these values would be made up of segments with positive or negative slope. This polyline can give us a more detailed view of how good has been the evolution from one generation i to the next, i + 1. The slope of the segment joining two points can be defined as:

St =

0 (ft+1 − ft0 )

(ft+1 − ft )

(4.45)

with positive or negative values depending on the evolution from parents to children individuals. Finally, the value of the NSC measure is defined in Equation 4.46:

nsc =

m−1 X

min(0, St )

(4.46)

t=1

The NSC measure takes values within the interval (− inf, 0]. A value of 0 means that the problem is easy to solve, whereas the more negative this value becomes, the more difficult it gets. The idea is to favor those techniques with highest nsc value to focus the search on the techniques for which the problem is easier. A variation of this measure is the Fitness-Proportional Negative Slope Coefficient (FPNSC) in which, for the construction of the fitness cloud, a proportional selection is carried out over the fitness values [PV07]. This measure gives, sometimes, a more realistic view of the problem difficulty than the original one. Genealogical Studies This Quality Measure is a variation of the Fitness Average measure previously proposed in this section. In this case, the proposed measure takes into account, not only the immediate performance, but also the historical performance of each reproductive technique. For this purpose, a genealogical tree of the individuals generated by the algorithm is maintained through all the execution. For each individual, Antonio LaTorre de la Fuente


68


the individuals it mates with and the offspring generated by this mating process are recorded, and a relation is established among all of them. This lets the algorithm to carry out a smoother adjustment of the participation of the different techniques, as a bad result in one generation’s offspring does not mean that the participation of that technique must be dramatically reduced. We can also decide how many of these ancestors are to be considered to compute the quality of the technique. This gives an increased freedom to select the appropriate pressure for the selection of the best performing techniques and avoid problems such as premature convergence.

4.2.2

Self-Adaptive Approach

This is the second strategy defined for the adaptation of the participation of each reproductive technique in MOS. This approach has been considered as it is classic in the literature (much of the work reviewed in Chapter 3 uses this strategy). In this case, the information regarding the participation ratio of each technique is encoded within the chromosome of the candidate solution. Some approaches use the same mechanisms used for the recombination of the solutions themselves to combine the participation information from the parents. However, this approach can not always be applied, as some recombination operators use extra information of the domain to combine the solutions that can not be applied to the participation ratios. Moreover, in some cases the encoding used for the individuals and, thus, the operators that work on them, are not continuous (for example, many of the encodings described in Section 5.3.1.2 for the Traveling Salesman Problem (TSP) are binary or integer). Figure 4.3 represents the general schema of a MOS algorithm with self-adaptive approach. In this diagram, the main differences between the self-adaptive and the central approach can be observed. First, the individuals are selected for mating within the hybrid algorithm, and not within the reproductive technique. Second, there are no Quality nor Participation Functions to control the number of individuals to be produced by each technique. In this case, the participation is encoded within the chromosome of the individuals. The information available from the parent individuals is combined and passed to the children individuals. Finally, with this combined information, a reproductive technique is selected and new individuals are generated. A pseudo-code of the MOS algorithm with self-adaptive approach can be found in Algorithm 5. The information coming from the parent individuals can be combined in many different ways. For this work, two different strategies have been proposed. The first one computes the Arithmetic Mean from the participation ratios of each father for each technique. Given the participation ratios of the following n parents and considering that there are m available techniques:

(1)

(2)

(m)

(1)

(2)

(m)

(2)

(m)

Π(P arent1 ) = {πi,1 , πi,1 , . . . , πi,1 } Π(P arent2 ) = {πi,2 , πi,2 , . . . , πi,2 } ... (1)

Π(P arentn ) = {πi,n , πi,n , . . . , πi,n }




69

Algorithm 5 MOS Algorithm with Self-Adaptive Approach 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

Create initial overall population of candidate solutions P0 (j) Uniformly distribute participation among the n used techniques → ∀j Π0 = |Pn0 | . Each technique (j) produces a subset of individuals according to its participation (Π0 ) Evaluate initial population P0 while termination criterion not reached do while offspring population not filled do Select parent individuals Merge participation information of parent individuals Select a technique Tj from this combined information Recombine parent individuals with technique Tj Copy combined participation information to children individuals (j) Add new individuals to an auxiliary population Oi end while (j) Combine populations Oi ∀j and Pi according to a pre-established criterion to generate Pi+1 end while

The combination of all this information is carried out as seen in Equation 4.47.

(1)

(2)

(m)

Π(Child) = {πi+1 , πi+1 , . . . , πi+1 } n 1 X (j) (j) πi+1 = πi,k n k=1 1 (j) (j) (j) πi,1 + πi,2 + · · · + πi,n = n

(4.47)

However, this is not the only way for combining the participation information. If the information coming from the parents with better fitness should be emphasized, then a Weighted Mean could be used. Considering the same participation ratios, the combination of this information would be done as depicted in Equation 4.48.

(1)

(2)

(m)

Π(Child) = {πi+1 , πi+1 , . . . , πi+1 } Pn (j) k=1 wk πi,k (j) πi+1 = Pn k=1 wk (j)

=

(j)

(4.48) (j)

w1 πi,1 + w2 πi,2 + · · · + wn πi,n w1 + w2 + · · · + wn

Other alternatives are, of course, possible, but they have not been considered for this study. Finally, it should be remarked that this approach has an important drawback. The problem of encoding the participation information within the chromosome of the individual is that this information will be quickly spread through all the population if one of the techniques is significantly better than the others in the first generations, suppressing any possibility for the other techniques to collaborate at a later stage of the search process. This behavior will be observed in the experimentation carried out in Chapter 6. This remark is important as much of the work reviewed in Chapter 3 uses this approach. Antonio LaTorre de la Fuente


70


Creation of Initial Population P0



Select individuals to mate

For each set of parents


Merge participation information from each parent

Select technique Tj according to the merged participation information

Recombine parents with technique Tj

Merge of Individuals in Oij with previous Population Pi Recombination Technique #j

Figure 4.3: General schema of a MOS algorithm with Self-Adaptive approach

4.3

Overview of the hybridization capabilities of Multiple Offspring Sampling

Figure 4.4 provides an overview of the different alternatives that Multiple Offspring Sampling offers for the combination of Evolutionary Algorithms. A Central or a Self-Adaptive approach can be selected. If the Central approach is selected, a Deterministic or a Dynamic Participation Function could be used. Different Deterministic PFs are proposed as well as several Quality Measures for the Dynamic PF. If the Self-Adaptive approach is used, then a mechanism for the combination of the participation information should be selected (in this work, two strategies, the Arithmetic mean and the Weighted mean, are offered). In the following chapters, an experimentation to test the performance of the hybrid algorithms built with the MOS methodology will be conducted. As the number of possible alternatives depicted in Figure 4.4 is quite large, only a subset of these possibilities will be tested. In particular, both the Central and the Self-Adaptive approaches will be considered. For the Central approach, most of the attention will be paid to the Dynamic Participation Function, whereas a Constant PF has also been considered as its counterpart to study the benefits of using a more sophisticated dynamic strategy to adapt the participation. Furthermore, two quality measures, Fitness Average and NSC, have been tested. Regarding the Self-Adaptive approach, both the Arithmetic and the Weighted mean strategies for combining the participation information have been used. This makes a total A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


4.3. OVERVIEW OF THE HYBRIDIZATION CAPABILITIES OF MULTIPLE OFFSPRING SAMPLING

71


Central

Self-Adaptive

Dynamic PF

Fitness Average

Algorithmic Difficulty

Negative Slope Coefficient

Arithmetic

Weighted

Diversity Deterministic PF

Genealogy

Fitness Distance Correlation

Constant PF

Incremental PF

Alternating PF

Static PF

Figure 4.4: Overview of the hybridization capabilities of MOS of five hybridization alternatives to be tested, that will be increased as several dynamic strategies and different parameters will be used. The experimentation conducted in the following chapters will be presented in a cumulative way and, thus, the first set of experiments (those with permutation problems) has been intended to test the different capabilities of the MOS framework (different operators and encodings). On the other hand, the second set of experiments (those with continuous functions) also covers the use of different Quality and Participation Functions, as well as the two versions of the MOS algorithm (central and self-adaptive, respectively). For this reason, the remaining possibilities of the Multiple Offspring Sampling framework will be only theoretically considered and they are included just for a matter of completeness. However, as the results presented in this work have been quite satisfactory, as we will see in the next chapters, further experimentation with other Participation Functions and/or Quality measures will be envisaged in the future.



72




Chapter 5

Application to Permutation Problems

5.1

Introduction

This chapter presents the results of the application of the proposed methodology to the resolution of two complex combinatorial problems. The first problem is the Supercomputer Scheduling, that was introduced in [LPRdM08]. In this problem, a set of jobs has to be scheduled for being dispatched in a supercomputer with a cluster structure. Each job has specific requirements of number of processors, memory and time for its execution. Moreover, the Supercomputer has its own per node configuration: number of processors and amount of shared memory. The problem consists in finding the schedule minimizing the total timespan of the system, satisfying the constraints imposed by both the jobs and the machine. This problem was of special interest to the research group due to its involvement in the Centro de Supercomputación y Visualización de Madrid (CeSViMa). In this computing center, a cluster-like supercomputer manages a quite large queue of waiting jobs that must be scheduled in order to obtain the best performance from the machine. This is a good problem to check if the proposed framework and methodology can be applied to complex real-world scenarios. The second considered problem is the Traveling Salesman Problem (TSP), a well known classic combinatorial problem in which N cities must be traveled so that the total tour length is minimized. This is a good example of an apparently easy to solve problem (we know how to solve it) that is actually one of the most complex combinatorial problems (it has been proved to be NP-hard). Several instances of different complexity have been used, with sizes ranging from 42 to 120 cities. These are standard instances that can be freely downloaded from the TSPLib homepage [Hei08]. In both problems, several Genetic Algorithms with different encodings for the solutions and recombination Antonio LaTorre de la Fuente


74

CHAPTER 5. APPLICATION TO PERMUTATION PROBLEMS

operators have been used to study the possible synergies (if any) among these techniques and how the algorithm is able to deal with such a heterogeneous scenario on these two complex problems. A comparative analysis has been carried out to assess if there is statistical evidence of the benefits of this combination of multiple techniques.

5.2

Supercomputer Scheduling

Scheduling problems are part of many real-world scenarios, such as logistics, manufacturing and engineering. Although on each of these scenarios the scheduling problem appears with different characteristics and constraints, this kind of problems has been divided into a classic taxonomy: Flow-Shop Scheduling (FSS), Job-Shop Scheduling (JSS), Multiprocessor Scheduling (MPS), etc. Jobs scheduling for supercomputers presents particular characteristics and, therefore, should be defined in a different way. Many current supercomputers are large cluster systems, as ranked in the TOP500 Supercomputers Site 1 . They are made up by hundreds or thousands of processors interconnected by a high-speed network. These facilities are designed to run parallel programs that are partitioned into a set of concurrent tasks. In general, only one of these tasks should be running on each processor (there should be no competing tasks on the same processor). Task scheduling for these systems consists in the partition of the resources (processors, in most of the cases) for the sequence of jobs to be run on the system with the objective of minimizing the total execution time. The tasks of the same job interact among them exchanging messages while running. A clear example are MPI parallel programs, common in many scientific (e.g., physics simulations, protein docking) and engineering (e.g., fluid dynamics, finite elements calculus) fields. This means a significant difference regarding the classic scheduling (e.g., multiprocessor scheduling) where the set of interdependent sequenced tasks were originally described by means of a Direct Acyclic Graph (DAG). Traditional solutions to schedule jobs in a supercomputer have been taken from batch process scheduling algorithms. These algorithms implement deterministic criteria to order waiting jobs and to submit them into execution. They are usually implemented on scheduling services, sometimes called resource managers. The most advanced systems provide the possibility to be configured to use many different algorithms, or even adhoc-defined variants that are more appropriate to the administrative policies of a given site. State-of-the-art cluster-based supercomputers are equipped with interconnected multiprocessor nodes. Each node has two, four or, in the near future, more processors (or multiple cores) sharing the same main memory (RAM). Under this configuration, scheduling policies should satisfy an additional constraint: the total amount of memory required by the processes running on a specific node must not exceed its available shared memory. Although swapping virtual memory is common in modern operating systems, if there is only one process running on each processor, swapping in and out memory pages significantly penalizes its performance. Considering more than two constraints (requested processes and free shared memory of the available nodes) 1 http://www.top500.org



5.2. SUPERCOMPUTER SCHEDULING

75

also increases the complexity of the scheduling problem. Evolutionary Algorithms have been used to solve complex optimization problems and they have an extensive literature in the domain of scheduling problems. This chapter presents the formal definition of the Supercomputer Scheduling (SCS) problem for parallel programs. To solve scheduling within this environment, traditional methods have been applied, as well as a hybrid algorithm based on the MOS methodology. For scheduling problems with moderate complexity, simple evolutionary methods provide better results in both resource usage and timespan, compared to classic approaches. But in the case of complex problems, with many waiting jobs and supercomputers with thousands of nodes, simple evolutionary techniques might not find the best scheduling as it is not easy to select the appropriate encoding and genetic operators among all the available ones. In these cases, combined heuristic methods are a quite interesting alternative. Results show how these combined methods help finding the best representation and genetic operators. Thus, the combined algorithm outperforms both simple traditional and evolutionary methods. The cluster configuration and jobs description for this experimentation have been taken from the regular operations from the Magerit supercomputer hosted at the CeSViMa (Centro de Supercomputación y Visualización de Madrid) 2 . This system manages a queue of a hundred waiting jobs (in average), and the experimental datasets have been taken from the log files of previous executions. Section 5.2.1 presents the state of the art on different scheduling problems, whereas Section 5.2.2 proposes the definition of the new Supercomputer Scheduling problem. Section 5.2.3 describes traditional (noncombinatorial) methods. In Section 5.2.4 the experimentation conducted in the Magerit supercomputer system is presented. Finally, Section 5.2.5 concludes this study.

5.2.1

State of the art

There are several kinds of scheduling problems defined in the literature. Although none of them fits perfectly in our specific scheduling problem, it is important to review them, as we borrow from scheduling literature the most relevant encodings and operators. Scheduling problems deal with the allocation of resources over time to carry out a set of tasks and they are characterized by three main components: • A number of machines and a number of jobs that must be submitted. • A set of constraints that must be satisfied. • A target function that must be optimized. In this section, the FSS, JSS and MPS scheduling problems are reviewed, as they are relevant for the Supercomputer Scheduling problem introduced in this work. There are other variants and subtypes which are not included in this review. 2 http://www.cesvima.upm.es



76


5.2.1.1 Flow-Shop Scheduling Problem The general Flow-Shop Scheduling problem, defined by [ACE06], is denoted as n/m/Cmax in the literature. It involves n jobs, each requiring operations on m machines, in the same machine sequence. The processing time for each operation is pij , where i ∈ {1, 2, . . . , n} denotes a job and j ∈ {1, 2, . . . , m} a machine. The problem is to determine the sequence of these n jobs that produces the smallest makespan assuming no preemption of operations. In the simplest situation, all the jobs are available and ready to start at time zero. In more realistic situations, jobs are released at different times. Scheduling literature has a lot of solution procedures for the general FSS scheduling problem. An excellent review about heuristic approaches can be found in [ACE06]. The best results have been obtained with Tabu Search [NS96, GW04] and Genetic Algorithms [RY98, BB04], which are the most popular methods. Besides, other methods such as Simulated Annealing [OS90] and Ant Colony Optimization [YL04] have also been applied.

5.2.1.2 Job-Shop Scheduling Problem The n × m minimum makespan general Job-Shop Scheduling problem can be described by a set of n jobs Ji , i ∈ {1, 2, . . . , n} which has to be processed on a set of m machines Mj , j ∈ {1, 2, . . . , m}. Each job must be processed in a sequence of machines. The processing of the job Ji on machine Mj is called the operation Oij . Operation Oij requires the exclusive use of Mj for an uninterrupted duration pij , its processing time. A schedule is a set of completion times for each operation that satisfies those constraints. Many different approaches have been proposed to solve the Job-Shop Scheduling problem. The best results seem to be reached with Tabu Search [NS96, BV98, SP05]. An explanation of this behavior can be found in [WBHW03]. Genetic Algorithms have also been applied to the Job-Shop Scheduling problem in a number of ways. The first attempt for solving the problem using evolutionary methods is carried out in [Dav85b]. One of the most successful GAs for scheduling is the GA3 algorithm by [Mat96]. Other relevant work can be found in [YN95, Bie95, BM99]. In all the cases, it is shown that conventional GAs are limited for this problem. Several improvements over different elements are proposed to produce results comparable to the most competitive methods. Typically, these articles present methods that: (i) include hill-climbers, (ii) take into account the application of problem specific knowledge, or (iii) use more advanced evolutionary models.

5.2.1.3 Multiprocessor Scheduling Problem The problem of scheduling a set of dependent or independent tasks to be processed in a parallel fashion is a field where some authors, such as [WYJ+ 04], have done interesting advances. A program can be decomposed into a set of smaller tasks. These tasks can have dependencies or precedence requirements, defined by means of a Direct Acyclic Graph. The goal of the scheduler is to assign tasks to available processors so that these A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



77

dependencies are satisfied and the total makespan is minimized. In [KA99], a good review and classification of deterministic and static scheduling algorithms can be found. Genetic Algorithms have been widely applied to the Multiprocessor Scheduling problem, as it can be seen in [AFEG99, AGD03, LC03, SM06]. Two main approaches are: (i) methods that use a GA in combination with other scheduling techniques and (ii) methods that use a GA to evolve the actual assignment and order of tasks into processors.

5.2.1.4 Other Packing and Knapsack Problems Together with the scheduling problems reviewed above, other combinatorial problems share a similar structure. Packing problems try to minimize the size of a container which is able to store a certain number of items. Some packing problems are actually puzzles in which finding the minimal size of a 2D shape to contain a given number of items with other shapes. Beside this type of problems, there are other interesting variants of packing problems, such as: (i) bin packing (N objects of different sizes must be packed into a finite number of bins of capacity V in a way that the number of used bins is minimized), (ii) multidimensional bin packing (objects have 2 or more dimensions and containers have different sizes depending on the dimension), (iii) set packing (several subsets of the same set of elements are provided and the objective is to maximize the number of selected subsets so that all pair-wise intersections between each two selected subsets are empty). Knapsack problems try to maximize the value of the objects carried in a knapsack. Each of the objects has a certain weight and the knapsack has a weight limit. There are different variants of knapsack problems: (i) bounded knapsack (one object may be chosen several times), (ii) multiple-choice knapsack (items are subdivided into k classes and exactly one item must be taken from each class), (iii) subset sum knapsack (for each item, the value and weights are identical), (iv) multiple knapsack (there are m knapsacks with capacities W i), to name a few. A review of several knapsack problems can be found in [Pis95].

5.2.2

Definition of the Supercomputer Scheduling Problem

Supercomputer Scheduling is a new scheduling problem. Real-world problems belonging to supercomputing have shown that none of the traditional scheduling problems match with the requirements of this scenario. Jobs in a supercomputer are usually a set of tasks that must be executed in parallel. This means that tasks have no sequential dependencies but must be run concurrently. Execution constraints are based on memory and processor availability. Each job running on the system is defined as Ji = (Ti , Mi , ti ), where Ti is the number of parallel tasks, Mi is the amount of memory per task and ti is the total execution time of all and each of the tasks. A supercomputer, in this domain, is defined as a set of n nodes S = (N1 , N2 , ..., Nn ). Each node is defined as Nj = (Pj , Aj ), where Pj is the number of available processors at the same node and Aj is the available Antonio LaTorre de la Fuente


78


shared main memory. When a job is running in the supercomputer, several tasks are assigned to a subset of the supercomputer nodes k = assign(Ji , Nj ), which means that k tasks from job Ji are running on node Nj . The number of tasks running on each node should not exceed the number of processors in that node. The number of assigned tasks should be equal to the number of total tasks belonging to this job, Ti . All the tasks must be run at the same time (in parallel).

Ti =

n X

n X

assign(Ji , Nj ) /

j=1

assign(Ji , Nj ) ≤ Pj

(5.1)

i=1

In this context, job scheduling is an ordered sequence of jobs to be run on the supercomputer. Jobs are dispatched to the supercomputer if it has available resources to process the first job of this sequence (job queue), depending on two constraints: À There are enough free processors in the system

∀j ∈ [1, n] :

X

assign(Ji , Nj ) ≤ Pj

(5.2)

i

Á There is enough free memory in each of the nodes

∀j ∈ [1, n] :

X

Mi × assign(Ji , Nj ) ≤ Aj

(5.3)

i

Supercomputer Scheduling could be considered as a particular case of multidimensional packing problems. Time, number of CPUs and memory are the three different dimensions to take into account. Although the general structure is similar to these types of packing problems, memory usage constraints are defined not for all the system but for a partition of the system (the nodes). There is the possibility of defining SCS as a multidimensional packing problem with additional constraints, but SCS is considered as a different problem subtype based on the real-world application that inspired it.

5.2.3

Related Work on Cluster and Supercomputer Scheduling

Scheduling is a key policy in the performance of expensive High Performance Computing (HPC) facilities. The goal of this policy is to reduce the execution time required for a parallel job (when only one of these jobs is considered) or to maximize the resource processor usage (when multiple jobs are taken into account). Additional criteria could also be considered as, for example, the minimization of the turnaround time (time between submission and termination). A scheduling policy should be aware of the requirements of the jobs. These requirements are usually expressed as execution constraints (termination deadline) or, mainly, resource requirements. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



79

Valid scheduling policies must ensure both resource provision and constraint boundaries; but once these basic constraints are provided, the policies have significant degrees of freedom to arrange the jobs. Commonly used process scheduling and workload managers use non-combinatorial deterministic strategies implemented by commercial or free software solutions. An extended overview on non-combinatorial scheduling can be found in [FRS04].

5.2.3.1 Non-Combinatorial Policies Non-combinatorial policies are implemented for a broad range of scheduling systems in HPC clusters. Combinatorial techniques are those which use permutation-based functions (such as insertion, deletion or swapping), either deterministically (brute force) or stochastically. Non-combinatorial techniques use greedy approaches or other direct methods. The First Come First Served (FCFS) policy [SY98] is one of the most popular approaches. In this case, parallel jobs are scheduled in the same order they arrive. The order in the queue of waiting jobs is used to dispatch them. If there are enough resources available in the system, these resources are allocated and the first job in the queue starts its execution. This selection is repeated while the requirements of the next job can be fulfilled. If not, the next job in the queue waits until one of the already running jobs finishes and its resources are released. Backfilling [Lif95] improves the FCFS policy allowing small jobs to be scheduled before their actual order if there are only few resources available. This policy prevents the waste of idle resources, in short term, but they would lead large jobs to starvation. The Extensible Argonne Scheduling sYstem (EASY), developed by IBM for SP1 clusters, reduces this unbalanced effect by means of a reservation mechanism, for the jobs waiting in the queue. This way, if a small job which can be executed by means of the Backfilling policy does not finish before the resources necessary for the execution of the first job in the queue are released, it is not allowed to execute, to avoid large delays for large jobs. Reservations are computed using the expected time when the resources required by the first waiting job will be available. This deadline is used to avoid the execution of smaller jobs that will finish after this deadline. The number of reservations may be parametrized. For example, in Conservative Backfilling [MF01b], reservations are made for all of the waiting jobs in the queue. Another important alternative for job scheduling is the Shortest Job First (SJF) policy [CADV02]. SJF changes the order in the queue according to the expected execution time. This model can be generalized by the amount of any of the resources required, instead of the execution time. If there is more than one resource considered (processors or memory), then different ordering criteria in the queue would be possible. The opposite scheduling policy, priorizing the longest jobs instead of the shortest ones, is also possible, and it receives the name of Longest Job First (LJF). Many of these alternatives consider the run-time estimation as one important input to the scheduler. This assumption is quite usual, and scheduling system implementations use different alternatives to deal with it. In Antonio LaTorre de la Fuente


80


many cases, jobs exceeding their expected execution time are directly killed. This policy encourages users to be as exact as possible in their estimations.

5.2.3.2 Scheduling Tools The implementation of the theoretical aspects of the different scheduling policies is also an important decision for cluster and supercomputer facilities. Based on the general policies mentioned above, many different toolkits and systems have been developed for job scheduling and workload management. The PBS family of resource managers (PBS, OpenPBS, and Torque) 3 implements a default FCFS scheduler, but also provide mechanisms to implement other simple schedulers. PBS includes a resource manager that acts as an interface for users to access the cluster. It can accept jobs and let users view the status of the queue. The scheduler reads the state of this queue, makes a scheduling decision, and informs the resource manager of its decision. The Maui Cluster Scheduler offers compatibility with the Torque resource manager. It comes with a wide variety of scheduling policies to try to accommodate different scheduling needs. The Moab Cluster Suite 4 is the successor to Maui. Among other improvements, it provides a large set of graphical tools which help an administrator to monitoring the state of the cluster and the queue. These tools are strategy-dependent, allowing administrators to monitor and change information contained in strategy-dependent parameters of jobs. Simple Linux Utility for Resource Management (SLURM)

5

[YJG03] is an open source, fault-tolerant,

and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling, and stream copy modules. A well-known commercial system is IBM’s LoadLeveler [KRM+ 01]. LoadLeveler provides basic queue management based on priorities. This system allows administrators to configure dynamic priority updates using different policies. LoadLeveler is sometimes used as a lower-level interface to more advanced resource managers, such as Maui. Any higher-level system can interact with LoadLeveler using its Application Programming Interface (API).

5.2.4

Experimentation

In this section, the experimental scenario and a comparison of the results obtained by classic methods (FCFS, Backfilling, Backfilling with reservations, SJF and LJF), individual Genetic Algorithms and MOS are presented. This experimentation tries to optimize the scheduling policy of the Magerit cluster, located at the CeSViMa (Centro de Supercomputación y Visualización de Madrid) 6 . This system consisted, in March 2008, of 1080 3 http://www.openpbs.org/ 4 http://www.clusterresources.com/ 5 http://www.llnl.gov/linux/slurm/download.html 6 http://www.cesvima.upm.es




81

Jobs that can execute

Can not be executed

Job1 Job2 Job3 Job4

Job3 Job4

Job

# CPUs

Mem

Time

Job

# CPUs

Mem

Job1

200

2.0 GB

23 hours

Job1

200

2.0 GB

23 hours

Job2

130

1.5 GB

11 hours

Job2

130

1.5 GB

11 hours

Job3

180

1.8 GB

21 hours

Job3

180

1.8 GB

21 hours

Job4

115

2.5 GB

15 hours

Job4

115

2.5 GB

15 hours

Job

State

Job

State

Job1

Running

Job1

Running

Job2

Running

Job2

Finished

Job3

Queued

Job3

Queued

Job4

Queued

Job4

Queued

(a) t = t0

Time

(b) t = t0 + 11 hours

Now it can execute

Now it can execute

Not this one

Job3 Job4

Job4

Job

# CPUs

Mem

Time

Job

# CPUs

Mem

Job1

200

2.0 GB

23 hours

Job1

200

2.0 GB

23 hours

Job2

130

1.5 GB

11 hours

Job2

130

1.5 GB

11 hours

Job3

180

1.8 GB

21 hours

Job3

180

1.8 GB

21 hours

Job4

115

2.5 GB

15 hours

Job4

115

2.5 GB

15 hours

Job

State

Job

State

Job1

Finished

Job1

Finished

Job2

Finished

Job2

Finished

Job3

Running

Job3

Finished

Job4

Queued

Job4

Running

(c) t = t0 + 23 hours

Time

(d) t = t0 + 44 hours

Figure 5.1: Scheduler description

eServer BladeCenter JS20, each of them with 2 Power970 2.2 GHz processors and 4 GB of shared RAM. All the results presented hereafter have been obtained for this machine configuration.

5.2.4.1 Evolutionary Techniques for Supercomputer Scheduling The solutions generated by the algorithms represent ordered sequences of jobs to be dispatched by the supercomputer. In this sense, these solutions are actually permutations of the jobs. Thus, any of the possible encodings that represent permutations, presented in Section 5.3.1.2, could be applied. Once the supercomputer receives the queue of jobs to be executed, it uses a deterministic policy to process it. If the supercomputer has no available resources to run the next job from the queue, it will wait until the end of a running job. When one job finishes its execution, all the assigned resources are released. Therefore, the waiting job has a new chance to be able to run. The following scenario is played at a small replica of the Magerit system, consisting of 180 nodes (which makes a total of 360 processors) and 4 GB of shared memory per node. In this scenario, depicted in Figure 5.1, the first two jobs in the queue are simultaneously dispatched to the supercomputer (Figure 5.1a). Job3 can not be executed because there are not enough available processors, so it must wait until both jobs have finished (Figure 5.1b). Once the number of processors required by Job3 are available, it is submitted to the supercomputer (Figure 5.1c). At this moment, Job4 can not be sent to the supercomputer, in this case due to Antonio LaTorre de la Fuente


82


Table 5.1: Experimental scenario (a) GA configuration

(Overall) Pop. Size Termination Convergence % Selection Crossover % Mutation %

(b) Parallel configuration

100 Pop. Convergence 99 % Roulette Wheel 90 % 1%

Paradigm Model Topology Migration Rate Migration Pop. Nodes

Islands Model Asynchronous Mesh 10 gens. Top 20 % 2

the lack of available memory. This job will run once the previous job has released its resources (Figure 5.1d).

5.2.4.2 First Experimental Scenario The MOS approach has been first tested with three datasets with sizes ranging from 60 to 120 jobs. Each job is described by the required amount of CPUs, memory and execution time. The experiments were conducted in the aforementioned Magerit system, making use of 2 of its 2160 processors. The Evolutionary Algorithm has used the configuration depicted in Table 5.1. These parameters were obtained from a preliminary experimentation phase and reported good overall results. A parallel islands model [CP98] has been used in order to reduce the execution time of the EA, which does not change the behavior of the algorithm itself, as previously seen in [Zaf08]. For this experiment, the MOS algorithm uses two different encodings: Path and Ordinal representations, reviewed in Section 5.3.1.2. To overcome the problem of combining individuals with different encodings, the same conversion function described in Section 5.3.2.2 has been used. As it will be seen in Section 5.3.1.2, different genetic operators have been proposed for permutation problems such as the TSP or the SCS. For the ordinal encoding, the classic one-point crossover and uniform mutator operators [Hol75] have been considered. For the path encoding, the Order Crossover operator [Dav85a] and the Exchange Mutation operator [OSH87] were selected. Another crossover operator was tested for this encoding (Cycle Crossover operator [OSH87]) but with poorer results, as previously stated in [MCZ00]. The fitness of each individual is computed as the percentage of processor time the system is busy (% of CPU usage) 5.4.

f itness =

total_processor_time scheduled_time ∗ cpus

(5.4)

Finally, in order to offer a comparative report of the performance of the MOS approach, the same datasets were scheduled using some classic non-combinatorial techniques (which are described in Section 5.2.3.1). These techniques are: Ê FCFS, Ë Backfilling without reservations, A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



83

Ì Backfilling with one reservation, Í two variants of the SJF, considering the required amount of processors and the expected execution time as the criterion to sort the queue of jobs, and Î two variants of LJF, which are similar to SJF but in the reverse order, both in number of processors and in the expected execution time.

O BJECTIVES: • Test the ability of the MOS framework to combine techniques with different encodings in the same hybrid algorithm. • Check if the hybrid algorithm obtains increased performance compared to traditional algorithms.

5.2.4.3 Results and Discussion of the First Experiment A summary of the obtained results can be found in Table 5.2. We can see that MOS performs as well as the best other technique for the first dataset and that it clearly outperforms classic approaches solving the two biggest datasets. All the problems have been executed 20 times, and the results are the average of all the executions, except for the deterministic methods (classic scheduling models) that were executed only once. The optimal value for the first dataset (0.5321) has been reached by the hybrid algorithm, as well as by three of the classic techniques. This circumstance does not happen in the most complex instances, where hybrid evolutionary techniques are able to improve the results of the other techniques. Table 5.2: Summary of the results of the first experiment

MOS FCFS Backfilling Backfilling Res. SJF Procs. LJF Procs. SJF Time LJF Time

60 Jobs 0.5321 ± 0.0000 0.3674 0.5321 0.3582 0.5321 0.4050 0.4349 0.5321

80 Jobs 0.9821 ± 0.0318 0.6499 0.7029 0.7398 0.6564 0.6305 0.6766 0.7779

120 Jobs 1 ± 0.0000 0.7068 0.8065 0.7503 0.7022 0.6957 0.6483 0.9047

It is also interesting to check that the standard deviation of the evolutionary techniques, although they are heuristic/stochastic methods, is very low. This makes this approach stable and reliable for real-world applications. Antonio LaTorre de la Fuente


84


The convergence to solutions with the same fitness does not mean that the actual same job order is obtained. Many different job combinations would lead to a similar, or even equal, performance (CPU usage). This feature is explained because it is more important to keep groups of jobs together, that are able to fit with the highest CPU usage in the parallel system. These groups of jobs may be swapped among them, as well as the jobs belonging to one of these groups may be swapped within the group context. These two kinds of organizations are preserved by the two evolutionary techniques combined by MOS. The crossover operating on the path encoding keeps large portions of the parents’ orders (preserving job groups). The mutation operator applied to the ordinal representation would move jobs inside the same job group. Focusing on the classic techniques, we can see that LJF, when dealing with expected execution time, behaves the best compared with the other non-combinatorial methods. This performance is due to the fact that the longer the job, the earlier it is scheduled. This makes shortest jobs be submitted at the end of the execution time, filling the gaps in the last part of the execution. It can be noted that the difficulty of the scheduling problem is not directly proportional to the number of jobs. Although there are more possible permutations, there could be more equivalent solutions with the best performance. The number of best solutions (with different ordering) depends on other characteristics of the problem, rather than the number of jobs. Sometimes, with a reduced number of jobs, the possible combinations are so limited that there is no job ordering schema with more than 60% of the CPU usage, similar to the figures obtained by classic methods.

5.2.4.4 Second Experimental Scenario For this second experiment, a larger dataset of 248 jobs has been used. It is described in the same terms (#CPUs, memory and execution time) than the previous ones. The same configuration for the Genetic Algorithm has been used (Table 5.1). In this case, instead of using two different encodings, the attention has been focused on using two different mutation operators for the same genetic representation (path encoding) simultaneously. These two mutation operators are the aforementioned Exchange Mutation operator [OSH87] and the Simple Inversion Mutation operator [Hol75]. Again, the performance of the Hybrid Evolutionary Algorithm is compared against the non-combinatorial techniques but also with the individual Genetic Algorithms, each of them using a different mutation operator.

O BJECTIVES: • Test the ability of the MOS framework to combine techniques with different operators in the same hybrid algorithm. • Check if the hybrid algorithm obtains increased performance compared to both traditional algorithms and individual EAs.




85

5.2.4.5 Results and Discussion of the Second Experiment The proposed dataset, even if it doubles in size the largest dataset used in the first experiment, presents the same properties as those seen before, i.e., the difficulty of the problem does not increase proportionally to the problem size. These problems can be solved by multiple different solutions with the same fitness. Nevertheless, we can observe that even under those circumstances, MOS is able to obtain better results in terms of average fitness and standard deviation than any other technique. The whole result list is provided in Table 5.3. Table 5.3: Summary of the results of the second experiment

MOS GA-SIM GA-EM FCFS Backfilling Backfilling Res. SJF Procs. LJF Procs. SJF Time LJF Time

248 Jobs 0.9632 ± 0.0002 0.9237 ± 0.0137 0.9451 ± 0.0015 0.7399 0.7819 0.7783 0.7861 0.7596 0.7529 0.8626

The non-parametric Wilcoxon test was applied to compare MOS and the individual Genetic Algorithms (using different mutation operators) to test the null hypothesis of the three algorithms (MOS and the two single GAs) having the same distribution. The p-values obtained were p = 0.0002 and p = 0.02 for GA-SIM and GA-EM respectively, which let us reject the null hypothesis and state that there is statistical significance. Thus, MOS outperforms each individual Genetic Algorithm.

5.2.5

Conclusions

The experimentation conducted in this section shows that combined metaheuristic methods are able to outperform traditional approaches. MOS also provides the ability to combine different metaheuristic methods and to select the most appropriate to optimize the given problem. This abstraction is quite interesting when the behavior of different techniques is previously unknown, as in this case. Additionally, this combination of heuristics can lead to improved results when compared with individual ones. This is the case of the Supercomputer Scheduling Problem, in which the combination of two Genetic Algorithms is able to find better solutions than each Genetic Algorithm itself. In this experimentation, the ability of the MOS framework to create hybrid algorithms combining different operators and encodings with increased performance compared with the individuals algorithms has been examined. As the results has been satisfactory, both hybridization models offered by MOS (central and selfadaptive), as well as several Quality and Participation Functions, will be tested in more detail in the next chapter, to conclude the exploration of the hybridization capabilities offered by the MOS framework. Antonio LaTorre de la Fuente


86


5.3

The Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a classic combinatorial optimization problem, popularized by the RAND corporation in the late 40’s [Rob49]. In a scenario with a set of cities and information related to the cost of traveling between each pair of cities, the TSP consists in finding the cheapest round-trip route that visits each city only once and then returns to the starting point. The problem is of considerable practical importance, apart from evident transportation and logistics areas, and including scheduling service visits and maintenance operations. Other application areas are: genomic (to integrate local maps into a single radiation hybrid map for a genome), astrophysics (to optimize the sequence of celestial objects to be imaged), semiconductor manufacturing (to optimize scan chains in integrated circuits) or networking (to design optimal communications links through a set of sites organized in a ring). For this reason, the TSP was selected to test the MOS approach. Section 5.3.1 will briefly review the main approaches to solving this problem. Both non-evolutionary and evolutionary methods will be surveyed. Then, in Section 5.3.2 the experimentation conducted and the results obtained are presented, whereas Section 5.3.3 summarizes the conclusions of these experiments.

5.3.1

State of the art

The most direct solution for the TSP would be to try all the possible permutations (ordered combinations) and determine the cheapest one (using brute force search). However, as the number of permutations is n! (the factorial of the number of cities, n), this solution rapidly becomes impractical. The problem has been shown to be NP-hard even for the case of cities on the plane with Euclidean distances, as well as in a number of other restrictive cases. For this reason, more and more attention has been payed to heuristics and metaheuristics capable of solving even large instances of the TSP with a reasonable balance between accuracy and computation time. In this section, different methods are reviewed, starting with non-evolutionary methods and followed by evolutionary ones. Finally, we will present some techniques combining both approaches.

5.3.1.1 Non-Evolutionary Approaches Local Search Algorithms

Several Local Search Algorithms have been proposed for the TSP. In Local Search

Algorithms, a neighbor structure of tours must be defined. There are two main classes of tour improvement mechanisms: Edge Exchangess (EEs) and Chain Exchangess (CEs). EE techniques exchange single nodes in the path to constantly improve the tour cost, whereas CE methods exchange two or more nodes. The most famous EE method is called r-opt. At a given iteration, it removes r edges from the current tour and attempts to find a better reconnection of the r remaining chains. In [Cro58], the first 2-opt method was proposed, whereas it was generalized in [Lin65]. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


5.3. THE TRAVELING SALESMAN PROBLEM

87

In [LK73], a dynamic r-opt heuristic in which the value of r is allowed to vary during the search was proposed. This is known as the Lin-Kernighan (LK) algorithm and it decides which r is the most suitable at each iteration step. This makes the algorithm quite complex, and few authors have been able to make improvements to it. A more in-depth study of the LK algorithm can be found in [Hel00]. One of the best known CE methods, called Or-opt, was proposed in [Or76]. It attempts to improve the current tour by moving a chain of three consecutive vertices to a different location (and possibly reversing it) until no further improvement can be obtained. The process is finally repeated with chains of two consecutive vertices, and then with single vertices.

Other Metaheuristic Algorithms

A neighborhood-search algorithm searches among the neighbors of a can-

didate solution to find a better one. By using local search algorithms, one can easily get stuck in a local optimum. This problem can be avoided using metaheuristic algorithms such as Tabu Search [GL97] or Simulated Annealing [KGV83]. All of these metaheuristic approaches have been widely applied to the TSP. A great variety of Tabu Search algorithms have been proposed for the TSP. The first work can be found in [Glo86]. Other interesting works are [Kno94, LPQ97, Hal00, LLL06, GS07]. The TSP was one of the first problems to which Simulated Annealing was applied [Cer85]. More recent efforts using this approach can be found in [Vob96, SSFS02, AMHV06, HV06]. There are other heuristic approaches that have been used for the TSP, such as Ant Colony Optimization [DG97] or Neural Networks [NT95], and exact methods such as Branch-and-Bound [ABCC98].

5.3.1.2 Evolutionary Approaches Many evolutionary techniques have been used to solve the TSP problem. In this review, more attention will be payed to Genetic Algorithms, although other evolutionary approaches such as Estimation of Distribution Algorithms have been successfully applied [RdML02]. In other cases, other evolutionary models, such as the usage of an immigration operator, have also been adopted [Yan04]. A very complete and extended review of the application of Genetic Algorithms for the Traveling Salesman Problem can be found in [LKM+ 99]. In this review, the authors propose the following taxonomy, based on how the candidate solution is encoded: r Binary representation. r Path representation. r Adjacency representation. r Ordinal representation. r Matrix representation. Antonio LaTorre de la Fuente


88


One of the main problems when Genetic Algorithms deal with the TSP problem is, in the case of some of these encoding alternatives, that traditional operators (crossover and mutation) generate malformed individuals (invalid tours). For this reason, many ad-hoc operators have been proposed in the literature as an alternative to a repair algorithm. Binary Representation In this representation format, an individual is encoded as a sequence of cities and each city is encoded using a binary substring. Each city requires log2 n bits, which means that the encoding of the problem is n log2 n bits long. For example, with 6 cities, 6 × 3 bits are needed. This representation format would lead to two different anomalies: (1) some bit combinations do not encode a valid city (in the previous example, 3 bits are required to code 6 cities, but only 6 out of the 8 combinations would be valid) and (2) crossover and mutation operators could generate invalid individuals. An additional problem is that a classic crossover operator has no meaningful explanation in this solution space. [Lid91] shows preliminary results of the solution of the TSP using such a binary representation. Path Representation

Similar in concept to the previous representation, path representation encodes each

solution as a tour of visited cities. Instead of using a binary encoding format, in this alternative cities are represented as integer values within the interval [1, n], n being the number of cities. As in the case of the previous representation, this encoding format could create malformed individuals (invalid tours). There are two possible alternatives to deal with this problem: (1) remove them from the population or (2) repair them, turning them into valid individuals. The problem of the malformed individuals can also be tackled by domain-specific operators. These operators drive the recombination process only through valid solutions. In the literature related to this representation, many specific crossover and mutator operators have been proposed (as it can be seen in Table 5.4). Table 5.4: Crossover operators for the Path representation Operator PMX

Reference [GJ85]

CX

[OSH87]

OX1

[Dav85a]

OX2

[Sys91]

POS

[Sys91]

HX

[Gre87]

ER

[WSS91]

Description Partially-Mapped Crossover: Elements between the two crossover points represent a partial map of the cities. Cycle Crossover: Values at each position are taken either from one parent or from the other, in the same position. Order Crossover: Elements are a subtour from one parent preserving the order of the other parent. Order Based Crossover: Random positions from one parent are selected and replaced by the appropriate values in the order they appear in the second parent. Position Based Crossover: Randomly selected elements from one parent are inserted at the same positions, keeping the order of the remaining elements from the other parent. Heuristic Crossover: Starting from a given city, builds a probability function based on the cost to each of the four possible connections (two for each parent). Then, one of this connection edges is selected based on this probability function. Edge Recombination Crossover: Constructs an “edge map” that represents city connections in both parents. Tours are constructed following this map and a given selection policy. Continued on next page




89

Continued from previous page SMX [Bra85] Sorted Match Crossover: Subpaths from each parent, visiting the same cities (possibly in different order), are taken. The parent with the path with the highest cost replaces this path with that from the other parent. MPX [MGSK88] Maximal Preservative Crossover: All the elements in a subsequence of one of the parents are removed from the other parent. The resulting offspring is created by the first parent offspring and the remaining elements of the second one. VR [Müh89] Voting Recombination Crossover: A p-sexual crossover operation in which elements are selected if there are enough parents that have the same element at the same position. AP [LKPM97] Alternating-Position Crossover: Alternative selection of elements from each of the parents (omitting repetitions). EAX [NK97] Edge Assembly Crossover: A set of partial tours is selected, choosing edges alternatively from each parent, several times. The edges on this set are removed from one of the parents and the edges from the other parent are included. SPX [SA03] Subtour Preservation Crossover: Uses subtour enumeration to generate multiple valid offspring. DPX [MF97] Distance Preserving Crossover: Keeps the edges that are common to both parents. The remaining edges should not be present in any of the parents. NX [JM02] Natural Crossover: In a 2D plot of the paths, and using freely drawn curves, cities are divided into two classes. When edges connect cities in the same class, they are preserved. Otherwise, a greedy method is used to repair the tour. VQX [SM02] Voronoi Quantized Crossover: An extended version of the previous operator. It uses vector quantization based on Voronoi regions. Inherited properties are, in fact, delimited by these regions. KCX [CM98] Knot-Cracker Crossover: Tries to preserve tour orientation and to avoid edge crossing. Cities are added from each parent depending on whether the previous conditions are satisfied or not. TC-PTL [PTL08] Two-Cut PTL Crossover: Initially developed for the no-wait flow-shop scheduling problem, produces distinct individuals even if both parents are equal. In this operator, a subtour from one parent is selected and moved to the beginning of the permutation, whereas the remaining cities are added in the same order they appear in the other parent. PBX [ZDLY08] Pheromone-Based Crossover: Both heuristic information and pheromone trails on edges are used for constructing new solutions. The heuristic information includes edge lengths and adjacency relations in the parents, whereas the pheromone trail is updated as in the Ant Colony Optimization algorithm.

There are improved versions of some of the previous crossover operators, such as an extended EAX operator [CLKT05] or a dynamic multi-parent crossover [ZJ06].

Adjacency Representation As in previous cases, solutions are represented as a list of n elements (the same number of cities). However, in this representation a value i in position j of the chromosome represents that city j is connected to city i. This encoding has the advantage of having only one encoding for each valid tour (or two for symmetric problems, the reverse tour). On the other hand, not all the possible combinations of values are valid tours. Isolated cities and multiple partitioned tours can be the result of the classic crossover operators. As in the other cases, a repair algorithm would be needed. [GGRG85] proposed different special crossover operators for this coding format.

Ordinal Representation

This encoding, sometimes called ranking representation, assigns one value (integer

or real) to each of the cities. Cities are then sorted according to this value (from lowest to highest values). This ordered list of cities defines the tour. In the case of integer values, a contest strategy should be defined to solve matches. In some cases, a reference ordered list is provided, allowing element values to be narrowed to a Antonio LaTorre de la Fuente


90


smaller range. The main advantage of this representation is that traditional crossover and mutation operators can be used. Different approaches to this encoding are found in [GGRG85] and [Mic96]. In [Kok05], a particular crossover operator, called Sum-Product Partition Crossover (SPPX), was proposed for many order and partition-based problems, including the TSP. However, although this operator performs quite well in problems such as the Graph Coloring Problem, the results reported in [Kok05] conclude that it is not very efficient for the TSP.

Matrix Representation

This representation, originally proposed by [FM90], uses a binary matrix to repre-

sent city connections. If the element at the (i, j) is a 1, it means that there is an edge between both cities. Matrix representation may also generate malformed tours, and thus, specific operators must be provided.

5.3.1.3 Memetic Algorithms Evolutionary Algorithms, by themselves, have great difficulties to find the optimal solution to TSP instances, especially to those with a large number of cities. For this reason, much work has been done around Memetic Algorithms (Section 3.2.2.6) [MN92, KN99, KS00, MF01a, HH01, Mer02]. This type of algorithms are, nowadays, among the best heuristics for the TSP.

5.3.2

Experimentation and Discussion

This section presents the experimental scenario used to test the ability of MOS to find existing synergies among different encodings and recombination operators. Next subsections provide a brief overview of the datasets used and the execution parameters of the algorithm followed by an analysis of the obtained results in the two experimental scenarios proposed and a discussion of the most interesting conclusions derived from these results.

5.3.2.1 Datasets and Execution Parameters The experimentation has been conducted on three datasets with sizes ranging from 42 to 120 cities. These are standard TSP instances in the literature used to test many different approaches to solve this problem, and can be downloaded from [Hei08]. The configuration used for these experiments is described in Table 5.5. These parameters reported good results in previous experiments [Zaf08] and, thus, they were chosen for these tests. Moreover, in these previous experiments the performance of the algorithm did not change regardless if an island model was used or not and, for this reason, an island model with 16 nodes was used, to speed up the experimentation. Equation 5.5 defines the fitness function used in our algorithm to evaluate the individuals. This measure has the advantage of being always within the interval [0 − 1] which makes it easier to interpret how good the A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



91

Table 5.5: Experimental scenario (a) GA configuration

(b) Parallel configuration

16000 Pop. Convergence 99 % Roulette Wheel 90 % 1%

(Overall) Pop. Size Termination Convergence % Selection of individuals Crossover % Mutation %

Paradigm Nodes Topology Migration period Migration rate Migration policy

Async. Islands Model 16 Mesh 10 gens. Top 20 % pop. Best-Worst

results are regardless of the different optimal tour lengths of each instance. length(tourbest ) length(tour)

(5.5)

The experimentation carried out in this study tries to confirm that the MOS methodology can improve the performance obtained by individual algorithms with different encodings and/or recombination operators.

5.3.2.2 Experiment 1: Exhaustive Approach In this first experiment, two different genetic encodings have been considered: path and ordinal representation. An algorithm using two different encodings faces the problem of mixing individuals of both types. This issue is overcome by implementing a bidirectional conversion function to transform individuals between the two representations (fp↔o ). 0.26 0.45 0.74 0.01 0.93 0.51 0.15 0.76

4

7

1

2

6

3

8

5

Figure 5.2: Conversion from ordinal (real) encoding to path (integer) encoding In first direction, the function (fp→o ) sorts the real vector used in the ordinal representation and takes the position of each gene in the ordered vector as the integer value for the path representation (Figure 5.2). In the other direction, the function (fo→p ) simply generates a random real number for each gene, comprised within an interval bounded by two values proportional to the integer value of the gene (Figure 5.3). 2

1

5

3

7

8

4

6

0.18 0.09 0.39 0.83 0.36 0.89 0.61 0.63

Figure 5.3: Conversion from path (integer) encoding to ordinal (real) encoding Regarding recombination operators, many alternatives have been proposed in the literature, as seen in Section 5.3.1.2. For the ordinal representation, the traditional 1-point crossover and the uniform mutation described by [Hol75] have been used. For the path representation, two crossover operators have been considered: Ordered Crossover (OX) [Dav85a] and Cycle Crossover (CX) [OSH87]), as well as two mutation operators (Repeated Exchange Mutation (REM) [Ban90] and Simple Inversion Mutation (SIM) [Hol75]). With all these alternatives, the set of techniques presented in Table 5.6 has been constructed. Antonio LaTorre de la Fuente


92


Table 5.6: Configuration of the five GA techniques t0 Encoding Crossover Mutator

t1

Path OX CX REM SIM

t2 Ordinal 1-point Uniform

t3

t4

Path OX CX SIM REM

With this set of techniques, a total of P5 − 1 = 25 − 1 = 31 different combinations can be constructed. As a final remark, it is important to note that the selected encodings and operators are the most classic in the the literature. More specialized operators will be considered for further research.

O BJECTIVES: • Test the ability of the MOS framework to combine techniques with different encodings and operators in the same hybrid algorithm. • Confirm that the hybrid algorithm obtains increased performance compared to individual algorithms. • Study how the number of considered techniques influence the performance of the hybrid algorithm.

Results The complete results obtained in this experimentation are shown in Table B.1. A first impression derived from this table is the worse performance of single techniques compared with the combination of more than one technique. The procedure described in Section A.2 has been used to validate the results obtained in this experimentation. The nWins procedure will be executed in a per problem basis, as the number of instances considered is quite low. For the same reason, the Holm procedure (section A.3) has not been used (the number of instances is much smaller than the number of algorithms to be compared). As the result of the nWins procedure is too extensive, a condensed version of the information reported by the procedure (Tables 5.7 and 5.8) has been preferred. In Table 5.7, the average number of wins obtained by all the single techniques is compared with the average number of wins obtained with 2, 3, 4 and 5 techniques. These figures show that the higher the number of techniques used is, the better the average performance becomes. The only exception is the Brazil58 dataset, where the combination of 4 techniques reaches better performance than the 5 techniques altogether. In all the cases, single techniques perform the worst, in average. Table 5.8 presents a comparison of the performance of the different techniques. For each technique, the table shows the average number of wins of all the combinations in which it participates, as well as a corrected version of this number. In this case, each technique only counts the proportional part of wins depending on A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



93

Table 5.7: Average number of wins compared with the number of techniques Number of techniques 1 2 3 4 5

Average number of wins Swiss42 Brazil58 GR120 -14.6 -12.8 -26 -3.9 -2.5 -0.8 6.3 4 0.4 7.8 8.4 21.8 10 7 25

Table 5.8: Average number of wins for each technique Number of wins Swiss42 Brazil58 GR120 t0 4.44 -0.06 5.75 t1 3.19 7.06 4.94 t2 -0.69 -0.44 5.37 t3 10.56 7.50 5.12 -2.25 -1.00 5.50 t4 Best combination(s) of techniques Average p-value against all the other combinations

Corrected number of wins Swiss42 Brazil58 GR120 0.74 -3.69 0.35 0.07 6.83 -0.35 -2.17 -5.23 0.05 4.02 7.39 -0.27 -2.67 -5.23 0.22 t0t2t3 t0t1t2t3 t0t1t2t4 t0t1t3t4 0.02 0.06 0.06 0.06

how many techniques take part in the combination. For example, if combination t1t3t4 scores 12 wins, each of the three techniques receives 12/3 wins. The average of these corrected wins is also weighted by the number of techniques in the combination. In the previous example, the number of wins has weight 1/3 to compute the average. The sum of this corrected average number of wins is 0. The behavior of the different techniques, when they are combined with others, computed by the corrected average number of wins keeps a close relationship with the best combination, but it is important to highlight the presence of techniques with worse performance in the best combinations. For example, in Swiss42, technique t2 obtains poor results, but it seems to deeply improve the results of some combinations of techniques (t0t2t3 is significantly better than t0t3). In Brazil58, in which t1 and t3 outperform other single techniques, the combination of both is improved when t0 and t2 are also included. In all the cases, the combination of multiple techniques performs better than the best of the single techniques: • In Swiss42: t3 is significantly worse than t0t2t3. • In Brazil58: t3 is significantly worse than t0t1t2t3. • In GR120: t0 is significantly worse than t0t1t3t4. Another important aspect to study is the relative position reached by the combination of all the five techniques, considering the number of wins (in all the cases, there are more than one combination with the same number of wins): Antonio LaTorre de la Fuente


94


• In Swiss42: t0t1t2t3t4 is ranked 10th out of 31 (wins=10). • In Brazil58: t0t1t2t3t4 is ranked 5th out of 31 (wins=7). • In GR120: t0t1t2t3t4 is ranked 3rd out of 31 (wins=25). In two of the considered instances, GR120 and BRAZIL58, the simultaneous combination of all the techniques, t0t1t2t3t4, is not statistically worse than the best combination of techniques. In the other problem, it is highly-ranked in the list of all the combination of techniques. This means that MOS is able to seamlessly manage several techniques obtaining better performance than individual algorithms and with a small penalty compared to the best performing combination of techniques (that has to be determined by an exhaustive test of all the possible combinations). Finally, one of the instances used in the experimentation, GR120, seems to be particularly difficult for all the single techniques. In this case, the combination of four of them, t0t1t3t4, clearly outperforms all of the single techniques individually. In the other datasets, one technique happens to be better than the others, t3 (Path Encoding + Order Crossover + Simple Inversion Mutation), although there exist some combinations of multiple techniques with improved performance (with better or worse statistical significance).

5.3.2.3 Experiment 2: Greedy Approach For the second experiment, three crossover operators for the path encoding were added: Edge Exchange Crossover (EXX), Subtour Exchange Crossover (SXX) and Partially Matched Crossover (PMX); that combined with the two previously introduced mutation operators, SIM and REM, leads to the six new techniques presented in Table 5.9. Table 5.9: Configuration of the six new GA techniques

Encoding Crossover Mutator

t5

t6

EXX REM

EXX SIM

t7

t8 Path SXX SXX REM SIM

t9

t10

PMX REM

PMX SIM

The total number of possible combinations of techniques (considering t1-t10) can be computed in the same way as for the previous experiment: P11 − 1 = 211 − 1 = 2047. It is obvious that an exhaustive execution of all the possible combinations of techniques would take too long to be reasonable. To overcome this constraint, a greedy approach was introduced (Algorithm 6). In the first step of this procedure, all the techniques are executed individually, and the best one is selected for the next iteration. This technique is now combined with every other available technique and the best combination is again selected for the next iteration. The procedure stops when no improvement was achieved by adding a new technique to the best combination from the previous step.




95

Algorithm 6 Greedy Approach Let A be the set of all the available techniques Let T = {} repeat Execute the algorithm 20 times with all the possible combinations of T ∪ tj , ∀tj ∈ A and store their associated average fitness value, fj 5: Select technique tbest with best = argmaxj {f1 , . . . , fn } 6: Let T = T ∪ {tbest } and A = A \ {tbest } 7: until Not improved fitness by adding one more technique to T

1: 2: 3: 4:

O BJECTIVES: • Check if a greedy algorithm can identify good combinations of techniques avoiding a brute search approach. • Study how including a Local Search influences the behavior of the hybrid algorithm.

The left side of Table 5.10 summarizes the execution of this constructive algorithm. Only the best combination of techniques is given for each step. As in the exhaustive approach, we can see that the combination of several techniques leads to an increased performance in all the tested datasets. It is also important to note that the EXX crossover operator introduced in this new experiment (associated to techniques t5 and t6) appears in every combination of techniques, which makes us think that this operator is somehow monopolizing the production of new individuals and making the greedy approach to get stuck. To avoid such a situation, a minimum participation ratio could be established, that would increase the diversity of the new solutions. The combination of Local Search mechanisms with Genetic Algorithms has been proved to be very effective to enhance the performance of the latter. Nevertheless, the inclusion of a Local Search procedure could hide the real contribution of the Genetic Algorithm itself and thus a balance between increasing the performance by means of these techniques and preserving the Genetic Algorithm as pure as possible should be found. As a trade-off solution, our algorithm applies a 2-opt Local Search to the best individual in the population at each generation. As it can be seen on the right part of Table 5.10, this kind of hybridization helps our algorithm to increase its average performance while keeping the main evolutionary behavior unchanged.

5.3.3

Conclusions

From the results discussed in Section 5.3.2, it can be seen that the combination of several reproductive techniques tends to improve the results of single Genetic Algorithms. It can be also observed that the combination of techniques reporting the best performance in different TSP instances is not always the same: different instances present different exploratory characteristics that can be exploited by different subsets of techniques. Antonio LaTorre de la Fuente


96


Table 5.10: Summary of the results Swiss42 Step 1 Step 2 Brazil58 Step 1 Step 2 Step 3 GR120 Step 1 Step 2 Step 3

Without Local Search tech(s) avg. fitness ± std. dev. t5 0.996 ± 0.007 t5t0 1.000 ± 0.000

With Local Search tech(s) avg. fitness ± std. dev. t5 1.000 ± 0.000 t5t0 1.000 ± 0.000

Without Local Search tech(s) avg. fitness ± std. dev. t6 0.974 ± 0.000 t6t3 0.987 ± 0.003 t6t3t2 0.986 ± 0.006


Without Local Search tech(s) avg. fitness ± std. dev. t6 0.874 ± 0.011 t6t3 0.914 ± 0.000 t6t3t5 0.914 ± 0.004


In general, the most suitable individual techniques usually appear in the best combination but, in some cases, the inclusion of an average technique can remarkably improve the overall performance. For this reason, it can be helpful to experiment with all the available techniques at the same time, as the algorithm is able to benefit even from techniques performing quite poorly with a minimum overhead and a negligible impact on the fitness of those techniques that do not actually contribute to the search.



Chapter 6

Application to Continuous Problems

6.1

Introduction

This chapter summarizes the results of the application of the proposed methodology to the resolution of complex continuous optimization functions. For this purpose, two state-of-the-art benchmarks have been considered. The first one is the CEC 2005 Benchmark, that was proposed for the Special Session on Continuous Optimization [SHL+ 05] of that year’s IEEE CEC Conference. This benchmark is composed of 25 functions divided into four groups: unimodal, basic multimodal, expanded and composed functions. The complexity of the functions grows with their ID number. The functions in the last group are built by the composition of up to 10 multimodal functions, which makes them really hard to be solved even with low dimensionality (10 and 30 dimensions were considered) due to their ruggedness and massive multimodality. The second benchmark taken into account is the CEC 2008 Benchmark proposed for the Special Session on Large Scale Global Optimization [TYS+ 07] held at the IEEE CEC 2008 Conference. This benchmark is made up of 6 well-known functions. Despite its reduced size, the difficulty of this benchmark comes from the high dimensionality proposed for the functions (1,000 dimensions). In this kind of functions, a special attention should be paid to the balance between global and local search so that the algorithm can explore the solutions space appropriately until it finds a promising set of solutions and then emphasize the local search to exploit these solutions. In both benchmarks, a diverse set of techniques has been used, considering several different evolutionary models, operators and parameters to study how these techniques can complement each other to improve the results that they would obtain if used separately. Antonio LaTorre de la Fuente


98

CHAPTER 6. APPLICATION TO CONTINUOUS PROBLEMS

6.2

CEC 2005 Benchmark

This section will review the CEC 2005 Benchmark of functions, that was proposed for the Special Session on Continuous Optimization [SHL+ 05], the experimental scenario proposed for this benchmark and the results obtained on it. This benchmark proposes a total of 25 optimization functions which present some of the typical characteristics of difficult continuous problems: multimodality, non-separability, rotation, shifting of the global optimum, etc. It represents a perfect scenario to test the suitability of hybrid approaches as these functions present very heterogeneous fitness landscapes in which Evolutionary Algorithms with different search capabilities can obtain better or worse results, depending on the case. For this reason, techniques with very different search characteristics have been selected and combined, trying to get the best from each evolutionary approach. In summary, first, each function in the benchmark will be described and analyzed in Section 6.2.1. Then, the experimental scenario proposed for this set of functions will be presented in Section 6.2.2. Finally, the results obtained for the different experimental configurations and functions will be presented and discussed in Section 6.2.3.

O BJECTIVES: • Carry out a detailed comparison of all the hybridization capabilities of the MOS framework: – Use of different evolutionary techniques. – Central vs. Self-Adaptive approach. – Several Dynamic Adjustment of Participation Strategies (in both the central and the self-adaptive approaches). – Fitness Average vs. Negative Slope Coefficient (in the case of the central approach). • Analyze the differences among all these hybridization alternatives. • Prove the convenience of using the central approach instead of the more commonly used self-adaptive approach.

6.2.1

Description of the CEC 2005 Benchmark

This section reviews the set of functions proposed for the Special Session on Continuous Optimization held at the 2005 Conference on Evolutionary Computation (CEC 2005). For this special session, a set of 25 difficult continuous optimization functions were proposed. These functions present many of the characteristics that make a continuous function hard to be solved: multimodality (many local optima that can mislead the search process), non-separability (dependencies among the variables that makes impossible for the algorithm to A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


6.2. CEC 2005 BENCHMARK

99

independently optimize each variable), shifting of the global optimum (to prevent algorithms to take advantage of a global optimum centered at zero), etc. Before formulating and describing each of the proposed functions, it is necessary to introduce the notation used through all this section. D represents the dimensionality of the problem, i.e., the number of variables to optimize. z is a candidate solution to the problem, whereas o is the optimal solution. Finally, M are linear transformation matrices with an associated condition number (a measure that tells how numerically wellconditioned a problem is, i.e., how small variations of the input data affect to the output value). The functions in this benchmark are classified into four groups: 1. Unimodal functions 2. Basic Multimodal functions 3. Expanded functions 4. Composition functions The first two groups are simple unimodal and multimodal functions, respectively. The third group is composed of functions constructed in the following way: given a 2-D function F (x, y) as a starting function, the corresponding expanded function EF (x1 , x2 , . . . , xn ) is defined as follows:

EF (x1 , x2 , . . . , xD−1 , xD ) = F (x1 , x2 ) + F (x2 , x3 ) + . . . + F (xD−1 , xD ) + F (xD , x1 ) Finally, the fourth group of functions is composed of hybrid functions constructed from functions of the second group. Hybrid functions apply each of the simple functions to the candidate solution and carry out a weighted average of the result values. Additionally, some stretching and compressing values are applied, as well as shifting of the global optima.

6.2.1.1 Unimodal Functions F1: Shifted Sphere Function This is a unimodal, shifted, separable and scalable function. It is defined by Equation 6.1.

F1 (x) =

D X

zi2

(6.1)

i=1

where:

z = x − o, x = [x1 , x2 , ..., xD ]


x ∈ [−100, 100]D


100


F2: Shifted Schwefel’s Problem 1.2 This is a unimodal, shifted, non-separable and scalable function. It is defined by Equation 6.2.

F2 (x) =

D X i X ( zj2 )

(6.2)

i=1 j=1

where: x ∈ [−100, 100]D

z = x − o, x = [x1 , x2 , ..., xD ]

(a) Sphere function

(b) Schwefel’s problem 1.2

Figure 6.1: 3-D plots of the Sphere function and Schwefel’s problem 1.2

F3: Shifted Rotated High Conditioned Elliptic Function

This is a unimodal, shifted, rotated, scalable and

non-separable function. It is defined by Equation 6.3.

F3 (x) =

D X

i−1

(106 ) D−1 zi2

(6.3)

i=1

where:

z = (x − o) ∗ M, x = [x1 , x2 , ..., xD ] F4: Shifted Schwefel’s Problem 1.2 with Noise in Fitness

x ∈ [−100, 100]D This is a unimodal, shifted, non-separable and

scalable function with noise in fitness. It is defined by Equation 6.4.

F4 (x) =

D X i X ( zj2 ) ∗ (1 + 0.4|N (0, 1)|)

(6.4)

i=1 j=1

where:

z = x − o, x = [x1 , x2 , ..., xD ]


x ∈ [−100, 100]D



101

(a) High Conditioned Elliptic function

(b) Schwefel’s problem 1.2 with Noise

Figure 6.2: 3-D plots of the High Conditioned Elliptic function and Schwefel’s problem 1.2 with noise

F5: Schwefel’s Problem 2.6 with Global Optimum on Bounds This is a unimodal, non-separable and scalable function with the global optimum located at the bounds. It is defined by Equation 6.5.

F5 (x) = max|Ai x − Bi |

(6.5)

In Equation 6.5 A is a D ∗ D matrix of random integer numbers, aij ∈ [−500, 500] ∀i, j ∈ [1, D] and |A| = 6 0. Ai is the i − th row of A. Bi = Ai ∗ o, where o is a D ∗ 1 vector of random integer numbers within the interval [−100, 100]. Finally, x ∈ [−100, 100]D .

(a) Schwefel’s problem 2.6

Figure 6.3: 3-D plot of Schwefel’s problem 2.6

6.2.1.2 Basic Multimodal Functions F6: Shifted Rosenbrock’s Function This is a multimodal, shifted, non-separable and scalable function with a very narrow valley from the local to the global optimum. It is defined by Equation 6.6. Figure 6.4a plots a Antonio LaTorre de la Fuente


102


3-D version of this function.

F6 (x) =

D−1 X

(100(zi2 − zi+1 )2 + (zi − 1)2 ) + 390

(6.6)

i=1

where: x ∈ [−100, 100]D

z = x − o + 1, x = [x1 , x2 , ..., xD ]

F7: Shifted Rotated Griewank’s Function without Bounds This is a multimodal, shifted, non-separable, rotated and scalable function, with no bounds for the variable x. The population should be initialized within the interval [0, 600]D , whereas the global optimum is outside of the initialization range. This function is defined by Equation 6.7. Figure 6.4b is a 3-D plot of this function.

F7 (x) =

D D Y X zi zi2 − cos √ + 1 − 180 400 i i=1 i=1

(6.7)

where:

z = (x − o) ∗ M, x = [x1 , x2 , ..., xD ]

(a) Rosenbrock’s function

x ∈ [0, 600]D

(b) Griewank’s function

Figure 6.4: 3-D plots of Rosenbrock’s and Griewank’s functions

F8: Shifted Rotated Ackley’s Function with Global Optimum on Bounds This is a multimodal, rotated, shifted, non-separable and scalable function. The global optimum is located at the bounds. It is defined by Equation 6.8, and Figure 6.5a is a 3-D plot of this function. v  u D u1 X z 2  − exp F8 (x) = −20exp −0.2t D i=1 i 

! D 1 X cos(2πzi ) + 20 + e − 140 D i=1

(6.8)

where: A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



103

z = (x − o) ∗ M, x = [x1 , x2 , ..., xD ] F9: Shifted Rastrigin’s Function

x ∈ [−32, 32]D

This is a multimodal, shifted, separable and scalable function, with a huge

number of local optima. It is defined by Equation 6.9.

F9 (x) =

D X (zi2 − 10 cos(2πzi ) + 10) − 330

(6.9)

i=1

where:

z = x − o, x = [x1 , x2 , ..., xD ]

x ∈ [−5, 5]D

(a) Ackley’s function

(b) Rastrigin’s function

Figure 6.5: 3-D plots of Ackley’s and Rastrigin’s functions

F10: Shifted Rotated Rastrigin’s Function

This is a multimodal, shifted, rotated, non-separable and scal-

able function, with a huge number of local optima. It is the same function than the previous one, with the difference that, in this case, z = (x − o) ∗ M . Additionally:

x = [x1 , x2 , ..., xD ]

x ∈ [−5, 5]D

Figure 6.6a is a 3-D plot of this function. F11: Shifted Rotated Weierstrass Function This is a multimodal, shifted, rotated, non-separable and scalable function. This function has the particularity of being continuous in all the domain but only differentiable in a particular set of points. It is defined by Equation 6.10, whereas a 3-D plot of this function is provided in Figure 6.6b.

F11 (x) =


D X

kmax X

i=1

k=0

! k

k

[a cos(2πb (zi + 0.5))]

−D

kmax X

[ak cos(2πbk ∗ 0.5)] + 90

(6.10)

k=0


104


where:

z = (x − o) ∗ M, x = [x1 , x2 , ..., xD ]

x ∈ [−0.5, 0.5]D

(a) Rastrigin’s function

(b) Weierstrass function

Figure 6.6: 3-D plots of Rastrigin’s and Weierstrass functions

F12: Schwefel’s Problem 2.13 This is a multimodal, shifted, non-separable and scalable function. It is defined by Equation 6.11.

F12 (x) =

D X

(Ai − Bi (x))2 − 460

(6.11)

i=1

where A, B are:

Ai =

D X (aij sin αj + bij cos αj ) j=1

Bi (x) =

D X (aij sin xj + bij cos xj ), ∀i ∈ [1, D] j=1

two matrices of D ∗ D dimension, and a and b are two random numbers within the interval [−100, 100]. Additionally:

α = [α1 , α2 , . . . , αD ], αj ∈ [−π, π], ∀j ∈ [1, D]

6.2.1.3 Expanded Functions F13: Shifted expanded Griewank’s plus Rosenbrock’s Function

This function is an expanded composi-

tion of Griewank’s (Equation 6.7) and Rosenbrock’s functions (Equation 6.6). The shifted function is defined A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



105

(a) Schwefel’s problem 2.13

Figure 6.7: 3-D plots of Schwefel’s problem 2.13

by Equation 6.12.

F13 (x) = F 8(F 2(z1 , z2 )) + F 8(F 2(z2 , z3 )) + . . . + F 8(F 2(zD−1 , zD )) + F 8(F 2(zD , z1 ))

(6.12)

where:

z = x − o + 1, x = [x1 , x2 , ..., xD ]

x ∈ [−3, 1]D

F14: Shifted Rotated Scaffer’s F6 Function This is an expanded version of Scaffer’s F6 function defined by Equation 6.13. p (sin2 ( x2 + y 2 ) − 0.5) F (x, y) = 0.5 + (1 + 0.001(x2 + y 2 ))

(6.13)

It is then expanded as shown in Equation 6.14

F14 (x) = EF (z1 , z2 , . . . , zD−1 , zD ) = F (z1 , z2 ) + F (z2 , z3 ) + . . . + F (zD−1 , zD ) + F (zD , z1 )

(6.14)

where:

z = (x − o) ∗ M, x = [x1 , x2 , ..., xD ]

x ∈ [−100, 100]D

6.2.1.4 Composition Functions These functions are compositions of multiple individual functions. For their complexity, their equations are not presented in this review. In [SHL+ 05], the pseudocode to carry out such a composition is provided. Antonio LaTorre de la Fuente


106


(a) F13 function

(b) F14 function

Figure 6.8: 3-D plots of F13 and F14 functions

F15: Hybrid Composition Function This is a multimodal, separable near the global optimum and scalable hybrid function with a huge number of local optima where characteristics of different functions such as Rastrigin, Weierstrass, Griewank, Ackley and Sphere are mixed together. The separability near the global optimum is due to Rastrigin’s function, whereas the two flat areas present in the function are due to the Sphere function. This function is defined within the interval [−5, 5]D .

F16: Rotated Version of the Hybrid Composition Function This is a rotated version of the previous function with the same characteristics.

(a) F15 function

(b) F16 function


F17: F16 with Noise in Fitness This is the same function than the previous one with the only difference of a Gaussian Noise introduced in the fitness function (Equation 6.15).

F17 (x) = F16 (x) ∗ (1 + 0.2|N (0, 1)|)


(6.15)



107

F18: Rotated Hybrid Composition Function This is a multimodal, rotated, non-separable and scalable hybrid function with a huge number of local optima in which properties of different functions such as Ackley, Rastrigin, Sphere, Weierstrass and Griewank are mixed together. Two flat areas are present in the function due to the Sphere function, and a local optimum is set in the origin. This function is defined within the interval [−5, 5]D .

(a) F17 function

(b) F18 function


F19: Rotated Hybrid Composition Function with Narrow Basin Global Optimum This is the same function than the previous one, but with different weights for composing functions that make the global optimum become a small basin.

F20: Rotated Hybrid Composition Function with Global Optimum at the bounds

This is the same func-

tion than F18 with the global optimum shifted to the bounds.

(a) F19 function

(b) F20 function




108


F21: Rotated Hybrid Composition Function This is a multimodal, rotated, non-separable and scalable hybrid function with a huge number of local optima in which properties of functions such as F13, Rastrigin, F14, Weierstrass and Griewank are mixed together. This function is defined within the interval [−5, 5]D . F22: Rotated Hybrid Composition Function with High Condition Number Matrix

This is the same

function than the previous one but with higher condition numbers for linear transformation matrices. This makes the fitness landscape rougher and more difficult to search.

(a) F21 function

(b) F22 function


F23: Non-Continuous Rotated Hybrid Composition Function

This is a non-continuous neutralized ver-

sion of the F21 function with the global optimum at the bounds. F24: Rotated Hybrid Composition Function This is a multimodal, rotated, non-separable and scalable hybrid function with a huge number of local optima in which properties of different functions such as Weierstrass, F13, F14, Ackley, Rastrigin, Griewank, non-continuous versions of F14 and Rastrigin, the High Conditioned Elliptic function and the Sphere function with noise in fitness are mixed together. It presents several flat areas due to the unimodal functions. This function is defined within the interval [−5, 5]D . F25: Rotated Hybrid Composition Function without Bounds

This is the same function than F25 except

for the open search range that is set for it. The population should be initialized within the interval [2, 5]D .

6.2.2

Experimentation in the CEC 2005 Benchmark

This experimentation has been conducted under the same conditions established for the aforementioned CEC 2005 Special Session [SHL+ 05]: same maximum number of Fitness Evaluations, same initialization interval, etc. The hybrid algorithms tested in this benchmark of functions are made up of four different techniques. Two of them are Genetic Algorithms, and their configuration can be found in Table 6.1a. These techniques have A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



109

(a) F23 function

(b) F24 function


been constructed with different objectives in mind. For the BCGM technique, a more exploitative profile was sought. For this reason, the BLX-α crossover and the Gaussian mutator have been used. On the other hand, the UCUM technique was intended to contribute with more exploratory characteristics and, for this purpose, the Uniform Crossover and Mutator were used. The other two techniques taking part in the hybrid algorithms are based on Differential Evolution. Here, two different crossover operators have been used, the Binomial and the Exponential Crossovers, as well as different values for the F parameter. Again, both techniques have been selected for their different exploratory characteristics (the Binomial Crossover tends to create more diverse solutions, at least in the early stages of the search process). In Table 6.1b, the complete description of the configuration used for these two techniques can be found. Table 6.1: Configuration of the different evolutionary techniques used (a) Configuration of the GA techniques

Selector Initializer Crossover Mutator Crossover Rate Mutation Rate

BCGM UCUM Tournament1 Uniform Uniform BLX-α2 Uniform Gaussian Uniform 90% 10%

(b) Configuration of the DE techniques

Selector Initializer Crossover F CR 1

DE Binomial DE Exponential Uniform Tournament1 Uniform Binomial Exponential 0.9 0.5 0.5 0.5

tournament size 2

1

tournament size 2 2 with α = 0.5

In Tables 6.2a and 6.2b, the common configuration for the hybrid algorithms in 10 and 30 dimensions, respectively, is presented. Two are the main differences between both configurations. First, the maximum number of Fitness Evaluations (FEs): 100, 000 for the 10 dimensional functions and 300, 000 for the 30 dimensional ones. And second, the degree of elitism: 100% for the 10 dimensional functions and 50% for the 30 dimenAntonio LaTorre de la Fuente


110


sional ones. This different level of elitism is due to the need for more evolutionary pressure for the smallest problems (10 dimensional functions) to converge quickly to an optimum and a more relaxed configuration for the more complex problems (30 dimensions) to cover a larger area of the solutions space. Table 6.2: Common configuration for 10 and 30 dimensional functions. (a) Common configuration for 10 dimensional functions

25 10 100 100, 000 FEs 100% Elitism 0%

Executions Problem Dimensions Population Size Convergence Criterion Elitism Min. Participation

(b) Common configuration for 30 dimensional functions

Executions Problem Dimensions Population Size Convergence Criterion Elitism Min. Participation

25 30 100 300, 000 FEs 50% Elitism 0%

Finally, seven hybrid configurations will be tested. Each hybrid algorithm will combine the four techniques described in this section (Tables 6.1a and 6.1b) with the configuration presented in Tables 6.2a and 6.2b. Each hybrid algorithm will use a different strategy for the adjustment of the participation of the four considered techniques, as described in Table 6.3. Both the Central and the Self-Adaptive approaches have been considered, with different Quality Measures and Adjustment Strategies in the case of the Central approach, and with two alternative mechanisms to carry out the combination of the participation vectors in the Self-Adaptive approach. The Dynamic PF1 Participation Function refers to the Dynamic Adjustment Strategy with constant population size, whereas the Dynamic PF2 Participation Function refers to the Dynamic Adjustment Strategy with variable population size (4.2.1.1.5). Table 6.3: Configuration of the hybrid algorithms Algorithm Configuration fAvg + Dynamic PF1 fAvg + Dynamic PF2 NSC + Dynamic PF1 NSC + Dynamic PF2 Constant PF Self-Adaptive Weighted Self-Adaptive

6.2.3

Quality Measure Fitness Average Fitness Average Negative Slope Coefficient Negative Slope Coefficient − − −

Adjustment Strategy Dynamic, constant population size Dynamic, variable population size Dynamic, constant population size Dynamic, variable population size Constant Self-Adaptive Weighted Self-Adaptive


Table 6.4 presents the summarized results of the experimentation carried out in the 10 dimensional functions. The information presented in this table describes the average behavior of each algorithm (the seven hybrid configurations and the four single techniques). For each of them, the average ranking in the 25 functions of the benchmark (second column), the results of the nWins Procedure (third column) and the results of the A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



111

Holm Procedure (fourth column) are provided. For this last procedure, the NSC + Dynamic PF2 configuration was chosen as the reference algorithm as it reported the best average ranking. A

"symbol means that the

Holm Procedure found statistically significant differences between that algorithm and the reference algorithm, whereas the

%symbol means that there is not statistical evidence that the reference algorithm is better than a

particular algorithm. Finally, the detailed list of average errors for each function and algorithm can be found in Table B.2. Table 6.4: Results in the CEC2005 Benchmark for 10 dimensional functions. Algorithm Configuration NSC + Dynamic PF2 NSC + Dynamic PF1 fAvg + Dynamic PF1 DE Exponential Constant PF DE Binomial Self-Adaptive Weighted Self-Adaptive BCGM fAvg + Dynamic PF2 UCUM

Average Ranking 2.94 3.22 3.26 5.44 6.38 6.48 6.72 7.00 7.40 8.56 8.60

Wins 8 6 6 1 0 0 −1 −2 −5 −6 −7

Holm Procedure −

% % " " " " " " " "

In the 10 dimensional functions, the configuration with best average ranking is the hybrid algorithm with the NSC quality measure and the dynamic participation function with variable population size. This configuration is followed by another two hybrid configurations: NSC + Dynamic PF1 and fAvg + Dynamic PF1. The first individual algorithm appears in the fourth place (according to its average ranking) and it is the DE Exponential configuration. This is in accordance with the analysis of the participation of the hybrid configurations that will be carried out further in this chapter. On the other hand, the fAvg + Dynamic PF2 configuration was the worst hybrid configuration. This is odd if we compare the performance of this configuration with that of the hybrid approach with the same quality measure but with different Dynamic PF. A conjecture of why this configuration presents this behavior is presented in Section 6.2.3.1. Both hybrid algorithms with Self-Adaptive adjustment of participation also perform quite poorly, probably due to the quick selection of the GA techniques as the best performing algorithms. The nWins procedure (appendix A) was applied to carry out a global comparative analysis of all the proposed configurations. This procedure carries out a pair-wise statistical comparison of all the available configurations by means of the Wilcoxon signed-rank test with a confidence level of 0.05. In Table 6.4, it can be observed that the NSC + Dynamic PF2 configuration obtains the highest number of wins, 8, which means that it obtained results statistically better, with a p − value < 0.05, in eight out of ten of the possible comparisons. Both the NSC + Dynamic PF1 and fAvg + Dynamic PF1 configurations obtained 6 wins. Finally, the best Antonio LaTorre de la Fuente


112


individual technique, the DE Exponential configuration, obtained only 1 win, a great difference compared to the best hybrid configurations. The Holm procedure (appendix A) was also considered for a global statistical analysis. This procedure carries out the comparison of multiple algorithms taking the Family-Wise Error into account (appendix A). It reports whether there are statistical differences between the reference algorithm, normally the one with the best average ranking, and the remaining algorithms or not. In this case, it found significant differences between the reference algorithm (NSC + Dynamic PF2) and all the remaining configurations except the NSC + Dynamic PF1 and the fAvg + Dynamic PF1 configurations. It is important to note that all the individual techniques obtained significantly worse results than the best hybrid algorithm. Table 6.5 presents the summarized results of the experimentation carried out in the 30 dimensional functions. The same average information is presented in this table. The detailed list of average errors for each function and algorithm can be found in Table B.3. Table 6.5: Results in the CEC2005 Benchmark for 30 dimensional functions. Algorithm Configuration NSC + Dynamic PF1 Constant PF fAvg + Dynamic PF1 DE Exponential Self-Adaptive NSC + Dynamic PF2 UCUM Weighted Self-Adaptive BCGM DE Binomial fAvg + Dynamic PF2

Average Ranking 2.34 4.24 4.96 4.98 5.42 5.44 5.80 5.94 8.16 8.36 10.36

Wins 10 5 3 1 2 2 1 0 −6 −9 −9

Holm Procedure −

" " " " " " " " " "

A similar analysis can be carried out in the 30 dimensional functions. In this case, the best performing configuration, the one with the best average ranking, is the NSC + Dynamic PF1, followed by two other hybrid configurations: the hybrid algorithm with constant participation ratios and the fAvg + Dynamic PF1. These three configurations have in common that they maintain a reasonable level of participation (especially in the case of the hybrid approach with constant participation ratios) at least in the first generations, in which the most critical phase of the search takes place. As regards the individual algorithms, the best performance is obtained again by the DE Exponential algorithm. On the other hand, the DE Binomial and the BCGM algorithms experiment a significant decrease of performance, although this fall is more pronounced in the case of the first one. Consequently, the UCUM technique gets its performance increased, which means that this configuration of recombination operators is more appropriated for global search (as it is the case of the 30 dimensional functions). A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



113

The same global statistical analysis as in the case of the 10 dimensional functions has been conducted. First, the nWins procedure was applied, and its results can be found in Table 6.5. This time, the difference between the best and the second best configurations is much higher, as the NSC + Dynamic PF1 doubles in number of wins the Constant configuration. The third highest number of wins is obtained by another hybrid approach, the fAvg + Dynamic PF1 configuration. The Holm procedure was then applied and, in this case, it found significant differences between the NSC + Dynamic PF1 configuration and all the others, which confirmed the superior performance of this configuration.

6.2.3.1 Analysis of the Participation Adjustment In the next paragraphs, a conjecture for the different performance of the hybrid algorithms is given. For each set of functions (10 and 30 dimensions), the participation adjustment for the hybrid approaches is compared on several functions. For every hybrid configuration, the ranking obtained by this configuration in that particular function is provided. In Figures 6.14 and 6.15, the participation adjustment of the six dynamic hybrid algorithms in functions F4 and F10 with 10 dimensions is compared. All the six configurations present a similar behavior in both functions (this can be extrapolated to the whole set of functions, with small differences depending on the particularities of each function) but quite different among them. At first sight, it can be seen that both techniques with a Self-Adaptive approach present a completely different behavior to those with a Central approach. These configurations quickly select both Genetic Algorithms as the techniques that will conduct the search process. However, this selection is wrong, as the Differential Evolution techniques obtain better final results in both functions. The problem of this approach is that, in some configurations, one or several techniques present better exploratory capabilities during the first generations of the search. With the Self-Adaptive approach, the population will be filled with a great number of individuals produced by these techniques, which makes it really hard for the other techniques to acquire the necessary level of participation to produce good solutions. The other four hybrid configurations favored, to a greater or lesser extent, the DE Exponential technique as the leading algorithm for most of the search process. Nevertheless, some differences can be appreciated. First, the Dynamic PF1 function presents a more curved behavior than the Dynamic PF2 function. Second, the NSC quality measure produces more rugged curves than the Fitness Average quality measure. In this case, the best combination is made up of the Dynamic PF2 function and the NSC quality measure. This makes sense for this set of functions, as the performance of the two DE techniques is considerably better than that of the GA techniques. As the number of available Fitness Evaluations is quite reduced, a quick selection of the best techniques is necessary (this is provided by the Dynamic PF2 function). On the other hand, some diversity in the population is needed to avoid premature convergence. For this reason, the more rugged behavior of the NSC quality measure is desirable, as it maintains this diversity better than the Fitness Average measure: there are continuous rises and drops of the participation of both DE techniques which let the algorithm escape from the potential local optima it may fall in. Antonio LaTorre de la Fuente


114


Participation Adjustment 1


UCUM BCGM DE Exp DE Bin

Participation

Participation


0

0 0

200

400

600

800

1000

0

500

1000

1500

2000 Generation

Generation

(a) fAvg + Dynamic PF1 (rank = 4)

3000

3500

4000

(b) fAvg + Dynamic PF2 (rank = 11)


2500



Participation

Participation


0

0 0

200

400

600

800

1000

0

500

1000

Generation

(c) NSC + Dynamic PF1 (rank = 3)

2000 2500 Generation

3000

3500

4000

4500

(d) NSC + Dynamic PF2 (rank = 1)


1500



Participation

Participation


0

0 0

200

400

600

800

1000

0

Generation

(e) Self-Adaptive (rank = 8)

200

400

600

800

1000

Generation

(f) Weighted Self-Adaptive (rank = 9)

Figure 6.14: Participation adjustment of the six hybrid algorithms in the F4 function with 10 dimensions

The 30 dimensional set of functions presents a slightly different scenario (Figures 6.16 and 6.17). On the first hand, the Self-Adaptive approaches no longer remove the DE techniques in the very first generations. Now, even if the participation of these techniques is still low at the beginning of the search, it is high enough A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



115




Participation

Participation


0

0 0

200

400

600

800

1000

0

500

1000

1500

2000 Generation

Generation


3000

3500

4000



2500



Participation

Participation


0

0 0

200

400

600

800

1000

0

500

1000

Generation



3000

3500

4000

4500



1500



Participation

Participation


0

0 0

200

400

600 Generation


800

1000

0

200

400

600

800

1000

Generation



for them to be able to surpass the GA techniques at, more or less, half of the search process. This is probably due to the increase in the complexity of the problems, which reduces the differences in the performance of the four techniques in the first steps of the search. On the other hand, regarding the other four hybrid algorithms, Antonio LaTorre de la Fuente


116


Participation Adjustment


1

1


Participation

Participation


0

0 0

500

1000

1500 Generation

2000

2500

3000

0


2000

4000

6000 Generation

8000

10000

12000




1

1


Participation

Participation


0

0 0

500

1000

1500 Generation

2000

2500

3000

0


2000

4000


10000

12000

14000




1

1


Participation

Participation


0

0 0

500

1000

1500 Generation

2000

2500

3000

0


500

1000

1500 Generation

2000

2500

3000



the best behavior is now offered by the NSC + Dynamic PF1 configuration. This configuration seems to be able to adapt its behavior to the particular characteristics of the function being solved better than any of the other hybrid approaches. For example, in Figure 6.16c it can be observed how the DE technique is quickly A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



117



1

1


Participation

Participation


0

0 0

500

1000

1500 Generation

2000

2500

3000

0


2000

4000

6000 Generation

8000

10000

12000




1

1


Participation

Participation


0

0 0

500

1000

1500 Generation

2000

2500

3000

0


2000

4000


10000

12000

14000




1

1


Participation

Participation


0

0 0

500

1000

1500 Generation

2000


2500

3000

0

500

1000

1500 Generation

2000

2500

3000



identified as the leading technique for the search process and it obtains the largest participation ratio. However, the remaining techniques do not get extinct, as it happens, for example, with Fitness Average + Dynamic PF1 (Figure 6.16a), and continuous fluctuations increase and decrease the participation of the DE Binomial Antonio LaTorre de la Fuente


118


technique. It is important to note the poor performance of both configurations using the Dynamic PF2 function, compared to their performance in the 10 dimensional functions. This is probably due to the shrinkage of the population size that this Participation Function implicitly carries out, which prevents the algorithms to appropriately explore the solution space and makes them converge to worse solutions. A few conclusions can be derived from this study: • A hybrid approach always obtains statistically better results. • A dynamic hybrid algorithm always obtains significantly better results than the hybrid algorithm with constant participation ratios. • The NSC Quality Measure seems to adapt better to the particularities of each function. • The Dynamic PF1 Participation Function seems to be more appropriate for problems in which a more diverse population is needed (more complex problems), and the Dynamic PF2 Participation Function should be preferred when the available number of Fitness Evaluations is quite small and the algorithm should focus on a particular region of the solutions space. The Dynamic PF2 Participation Function has the ability to reduce the population size, which allows the algorithm to execute for a larger number of generations on a less diverse population, and converge to an optimum more easily. • The Self-Adaptive approach can be deeply influenced by the different search capabilities of the available techniques in the early stages of the search and select the wrong subset of techniques to guide the search process. Finally, the same comparative analysis conducted between the proposed hybrid configurations and the individual algorithms will be carried out to compare MOS with the selected algorithms from the original special session that was done in [GMLH08]. In this comparison, the best configurations of the MOS algorithm in 10 and 30 dimensions (MOSN SC+dynP F 2 and MOSN SC+dynP F 1 , respectively) will be used, as well as a configuration made up of the best results obtained by any of the hybrid approaches in each function (MOSbest ). This configuration should be considered equivalent to those of some of the algorithms participating in the original session which adjusted their parameters depending on each particular function [BSCG05, RKP05]. In this case, the only parameters to be adjusted are the most suitable Participation Function and Quality Measure for each function. This way we will be able to see how far a hybrid algorithm built with the MOS framework can go. Table 6.6 summarizes the results obtained in the 10 dimensional set of functions. The algorithm which achieved the best performance was the G-CMA-ES, with the best average ranking and the highest number of wins, and thus, it was selected as the reference algorithm for the Holm procedure. The second best algorithm according to its average ranking and number of wins was the MOSbest configuration. From these results, it can be seen that the MOSbest configuration proposed in this work outperforms 10 out of the 11 algorithms presented at the aforementioned special session and competition held at the IEEE CEC 2005 Conference. The only algorithm with better performance than MOSbest was the G-CMA-ES algorithm, although the Holm procedure A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



119

could not find significant differences between the performance of both algorithms. The MOSN SC+dynP F 2 configuration is ranked in an average position, but it should be noted that the Holm procedure could neither find significant differences if compared with the G-CMA-ES algorithm. Table 6.6: Comparative with the algorithms of the CEC 2005 Special Session for the 10 dimensional functions Algorithm Configuration G-CMA-ES MOSbest L-SaDE DMS-L-PSO BLX-GL50 DE MOSN SC+dynP F 2 L-CMA-ES SPC-PNX EDA KPCX BLX-MA CoEVO

Avg. Ranking 4.02 5.08 5.80 6.14 6.38 6.74 6.84 7.52 7.76 7.98 8.14 8.54 10.06

Wins 10 5 3 3 2 1 0 −1 −1 −2 −4 −6 −10

Holm Procedure −

% % % % % % " " " " " "

In Table 6.7, the summary of the results in the 30 dimensional functions can be found. In this case, the best performing algorithm is again the G-CMA-ES, followed by the BLX-GL50 algorithm and, in third place, the MOSbest configuration. However, in this set of functions the difference in performance has been dramatically reduced in both the average ranking and the number of wins. Furthermore, the Holm procedure could only find significant differences among G-CMA-ES and two out of the nine other algorithms, a number much smaller than in the previous experiment. Table 6.7: Comparative with the algorithms of the CEC 2005 Special Session for the 30 dimensional functions Algorithm Configuration G-CMA-ES BLX-GL50 MOSbest L-CMA-ES BLX-MA KPCX SPC-PNX MOSN SC+dynP F 1 EDA CoEVO


Avg. Ranking 3.70 4.18 4.46 4.82 4.84 5.08 5.34 5.38 8.56 8.64

Wins 4 2 3 0 2 2 1 1 −7 −8

Holm Procedure −

% % % % % % % " "


120


It is important to note that for the 30 dimensional comparison the number of considered algorithms is not the same as for the case of the 10 dimensional functions. The reason is that some of the algorithms did not report any results for the 30 dimensional functions. Regarding the MOSN SC+dynP F 1 configuration, it is ranked in a lower position but, as it has been said before, the differences of performance in this number of dimensions have been dramatically reduced compared to the 10 dimensional functions. Moreover, the L-CMA-ES algorithm is ranked in a much higher position according to its average ranking, whereas it obtains zero wins (for one win of the MOSN SC+dynP F 1 configuration). The same situation happens with BLX-GL50 and MOSbest : the former is ranked in second position according to its average ranking whereas the latter obtains one more win in the statistical pair-wise comparison. We can conclude this study by remarking on the robust behavior of the different hybrid configurations of MOS in comparison with the single algorithms on themselves and the good results obtained in the comparative against the algorithms proposed for the original special session: MOSbest is, together with G-CMA-ES, the only algorithm among the three best algorithms in both the 10 and the 30 dimensional set of functions.

6.3

CEC 2008 Benchmark

6.3.1

Description of the CEC 2008 Benchmark

This section describes the CEC 2008 Benchmark set of functions. These continuous optimization functions were proposed for the CEC’08 Special Session and Competition on Large Scale Global Optimization [TYS+ 07] and represent good examples of complex optimization functions with different characteristics (multimodality, non-separability, etc.).

O BJECTIVES: • Test the ability of the MOS framework to deal with a relatively large number of techniques. • Test the performance obtained by a hybrid algorithm built with MOS in large scale problems.

In the definition of each function, D represents the dimension of the problem, x = [x1 , x2 , . . . , xD ] a particular solution, o = [o1 , o2 , . . . , oD ] the optimal solution and z = x − o the shifted vector. 6.3.1.1 Shifted Sphere Function The first function is a unimodal, shifted, separable and scalable minimization function defined as:

F1 (x) =

D X

zi2

i=1




121

with

x ∈ [−100, 100]D

6.3.1.2 Schwefel’s Problem 2.21 The second function is a unimodal, shifted, non-separable and scalable minimization function defined as:

F2 (x) = max |zi |, 1 ≤ D i

with

x ∈ [−100, 100]D

(a) Shifted Sphere function

(b) Schwefel’s problem 2.21

Figure 6.18: 3-D plots of the Shifted Sphere function and Schwefel’s problem 2.21

6.3.1.3 Shifted Rosenbrock’s Function The third function is a multimodal, shifted, non-separable and scalable minimization function defined as:

F3 (x) =

D−1 X

100(zi2 − zi+1 )2 + (zi − 1)2

i=1

with

x ∈ [−100, 100]D The main particularity of this function is that it has a very narrow valley from the local to the global optimum. Antonio LaTorre de la Fuente


122


6.3.1.4 Shifted Rastrigin’s Function The fourth function is a multimodal, shifted, separable and scalable minimization function defined as:

F4 (x) =

D X

zi2 − 10cos(2πzi ) + 10

i=1

with

x ∈ [−5, 5]D This function presents a huge number of local optima.

(a) Shifted Rosenbrock’s function

(b) Shifted Rastrigin’s function

Figure 6.19: 3-D plots of Shifted Rosenbrock’s and Rastrigin’s functions

6.3.1.5 Shifted Griewank’s Function The fifth function is a multimodal, shifted, non-separable and scalable minimization function defined as: D D X Y zi2 zi F5 (x) = − cos √ + 1 4000 i=1 i i=1 with

x ∈ [−600, 600]D

6.3.1.6 Shifted Ackley’s Function The sixth function is a multimodal, shifted, separable and scalable minimization function defined as: A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



123

v  u D u1 X F6 (x) = − 20exp −0.2t z2 D i=1 i ! D 1 X − exp cos(2πzi ) + 20 + e D i=1 

with x ∈ [−32, 32]D

(a) Shifted Griewank’s function

(b) Shifted Ackley’s function

Figure 6.20: 3-D plots of Shifted Griewank’s and Ackley’s functions

6.3.2

Experimentation in the CEC 2008 Benchmark

This experimentation has been conducted under the same conditions established for the aforementioned CEC 2008 Special Session [TYS+ 07]: same number of maximum number of Fitness Evaluations, same initialization interval, etc. The hybrid algorithms tested in this benchmark of functions are made up of eight different techniques. Four of them are Genetic Algorithms, and their configuration can be found in Table 6.8. These techniques have been constructed from a set of two crossover and two mutation operators in order to provide a set of techniques with different enough search capabilities. Table 6.8 provides a detailed description of the parameters selected for the four GAs. Two other techniques are based on Differential Evolution. In this case, two different crossover operators have been used, the Binomial and the Exponential Crossovers, as well as different values for the F parameter. Again, both techniques have been selected for their different exploratory characteristics (the Binomial Crossover tends to create more diverse solutions, at least in the early stages of the search process). In Table 6.9a, the complete description of the configuration used for these two techniques can be found. Antonio LaTorre de la Fuente


124


Finally, the last two techniques are two Evolution Strategies. Again, two different crossover operators have been used to guarantee that different exploratory characteristics are present in the hybrid algorithm. Table 6.9b offers a full description of the configuration used for these techniques. Table 6.8: Configuration of the GA techniques BCUM

Selector Initializer Crossover Mutator Crossover Rate Mutation Rate 1

UCUM BCGM UCGM Roulette Wheel Uniform BLX-α1 Uniform BLX-α1 Uniform Uniform Gaussian 90% 1%

BLX-α with α = 0.5

Table 6.9: Configuration of the DE and ES techniques (a) Configuration of DE techniques

Selector Initializer Crossover F CR

(b) Configuration of ES techniques

DE Binomial DE Exponential DE Selection Uniform Binomial Exponential 0.9 0.9 0.5 0.9

Selector Initializer Crossover Mutator ρ1 τ2 1 2

Discrete ES Intermediate ES Uniform Uniform Discrete Intermediate Isotropic 2 √1 2N

Mixing number Leaning rate

In Table 6.10, the common configuration for the hybrid algorithms is presented. All the parameters, except the population size, the elitism type and the minimum participation ratio, were fixed according to the guidelines provided in [TYS+ 07]. The population size and elitism type were fixed based upon previous experimental experience, whereas the minimum participation ratio was established to guarantee certain diversity in the offspring populations. Table 6.10: Common configuration for the hybrid algorithms. Executions Problem Dimensions Population Size Convergence Criterion Elitism Minimum Participation

25 1, 000 200 5, 000, 000 FEs 100% Elitism 5%

Finally, the same seven hybrid configurations reported in Table 6.3 will be tested. Each hybrid algorithm will combine the techniques detailed in Tables 6.8, 6.9a and 6.9b, with the configuration detailed in Table 6.10. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)



6.3.3

125


Table B.4 summarizes the results of the experimentation carried out in the CEC 2008 benchmark with 1, 000 dimensions. The information presented in this table represents the average behavior of each algorithm (the seven hybrid configurations and the eight single techniques). This benchmark is made up of only six functions, which makes it difficult for the statistical tests to find significant differences. In [GMLH08], a general rule is given about the minimum number of instances of a distribution (average fitness values in this case) needed to assess the efficiency of the statistical test:

N =a·k where N is the number of functions (instances), k is the number of algorithms to be compared and a ≥ 2. It is obvious that our experimental scenario does not fulfill this requirement, as the number of functions is 6 and the number of algorithms 15 (8 single and 7 hybrid algorithms). For this reason, both the single and the hybrid algorithms have been grouped as a sole configuration, called Single Best and MOS Best, respectively. Table 6.11 shows these aggregate results. It can be seen that the hybrid algorithms obtain lower average error in 5 out of the 6 functions of the benchmark, whereas the same performance is obtained in the remaining function. The best results of hybridization are obtained in Rastrigin, Griewank and Ackley’s functions, in which the average error has been reduced, at least, in one order of magnitude. In Schwefel’s problem and Rosenbrock’s function, the improvements obtained are a bit smaller, whereas the Sphere function is too easy to solve even though its size is 1, 000 dimensions. Table 6.11: Summary of the results in the CEC 2008 Benchmark with 1,000 dimensions

Single Best MOS Best

Sphere 0.00E+00 0.00E+00

Schwefel 6.84E+01 5.61E+01

Rosenbrock 1.13E+03 1.11E+03

Rastrigin 1.13E+03 7.08E+02

Griewank 1.37E-01 3.83E-09

Ackley 1.05E+01 4.67E+00

In spite of the limitations of the statistical tests when a limited number of instances are available, a Wilcoxon signed-rank test has been executed to give a general idea of how significant these results are. The result of this test was a p − value < 0.05 which means that, despite the inability of this kind of tests for detecting significant differences between distributions, there are evidences of a better performance of the hybrid algorithms. A few conclusions can be derived from this study: • A hybrid approach always obtains significantly better results than a single algorithm. • There is no clearly best hybrid configuration. Some combinations of participation function and quality measure are better than others in different problems. Furthermore, constant participation ratios seem to have a good performance in some functions. Antonio LaTorre de la Fuente


126


A comparative analysis has also been carried out with the algorithms that participated in the CEC 2008 Special Session in which this benchmark of large scale functions was proposed. Table 6.12 summarizes the results of this comparison. Again, the values reported by the statistical tests should be interpreted with care, as the number of functions is quite reduced. However, it seems clear that the MTS algorithm is the best performing in this kind of functions, as it had an average ranking and an nWins value much better than the remaining algorithms. MOS occupies the fourth place in this ranking (out of ten algorithms), with an average ranking a bit worse than that of LSEDA-gl and jDEdynNP-F but only one win less than these two algorithms. Table 6.12: Average ranking and results of the nWins and the Holm Procedures in the CEC2008 Benchmark for 1,000 dimensional functions. Algorithm Configuration MTS LSEDA-gl jDEdynNP-F MOS MLCC DMS-PSO DEwSAcc UEP EPUS-PSO ALPSEA

Average Ranking 1.75 3.33 3.33 4.50 4.83 5.75 7.00 8.00 8.00 8.50

Wins 7 3 3 2 1 −2 −3 −3 −4 −4

Holm Procedure −

% % % % % " " " "

To conclude this study, it should be remarked that, again, hybrid approaches obtained better results in comparison with the single algorithms in most of the functions. Moreover, the results of the hybrid algorithms are quite competitive compared to those of the algorithms proposed for the original Special Session. However, some work should be done in some functions, as the results obtained by MOS are still far from those of the MTS algorithm. Some of the future lines described in Chapter 10, such as the addition of some kind of restart mechanisms or initialization heuristics, could probably improve the results that, even though, are quite competitive for an algorithm for which no special parameterization is needed.

6.4

Conclusions

In this chapter, the proposed framework for the development of hybrid EAs with dynamic adjustment of participation is tested on two standard benchmarks for continuous optimization. The experimental results show that the combination of different search strategies increases the performance of the algorithms. Furthermore, these results have been proved to be competitive against the state-of-the-art algorithms in this kind of problems. Different sets of techniques have been used with the two benchmarks. For the CEC 2005 benchmark, a set of four techniques was used, whereas for the CEC 2008 benchmark this number was increased up to a A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


6.4. CONCLUSIONS

127

total of eight techniques. The difference in the number of techniques that have been used is due to the limited number of Fitness Evaluations imposed by both benchmarks. As this number is quite small for the first one, the number of techniques selected is also smaller in this case. A larger number of techniques would have required a larger population which would have reduced the available number of generations and this would have probably diminished the performance of the hybrid algorithm. For this reason, it is important to find a balance between the number of techniques that will be combined and the available number of FEs so that the algorithm does not waste much evaluations in determining the most suitable way for combining these techniques. Other evolutionary approaches, such as EDAs or PSO, or quality functions could have been used for this experimentation. However, this study was intended to find out whether the combination of different EAs could improve the performance of the algorithm and how this combination could be carried out. These questions have been satisfactory answered, which opens the way for further experimentation and analysis with other evolutionary techniques and quality measures not limited to those described in this work.



128




Chapter 7

Behavioral and Computational Analysis

In this chapter, a detailed analysis of two important aspects of the hybridization of Evolutionary Algorithms will be conducted. As the information derived from this study would be too extensive to be reported in this work, only a subset of the problems described in the previous chapters will be considered. The first of these studies will try to graphically depict how the synergies among different evolutionary techniques can appear, paying special attention to how a change in the participation of these techniques can affect the behavior of the hybrid algorithm. In particular, two well-known continuous functions will be considered: Rastrigin’s and Griewank’s functions. The second study will analyze the overhead of the MOS framework in terms of execution time when compared with the individual algorithms which are being combined by means of this framework. In this case, the full CEC 2005 benchmark has been used, as it contains a representative number of functions of different complexity.

7.1

Behavioral Analysis

A behavioral study has been carried out on two well known continuous optimization functions: Rastrigin’s (Section 6.3.1.4) and Griewank’s (Section 6.3.1.5) functions, both belonging to the previously introduced CEC 2008 Benchmark. For this behavioral study, two quality measures were considered: the Negative Slope Coefficient (NSC) and the Fitness Average; and one Dynamic Participation Function, with constant overall population size. Antonio LaTorre de la Fuente


130

CHAPTER 7. BEHAVIORAL AND COMPUTATIONAL ANALYSIS

Four genetic techniques, made up by the combination of two crossover and two mutation operators, were used for this experiment. The complete description of the proposed techniques can be found in Table 7.1. For this experimentation, these four techniques represent a reasonably diverse set of search approaches, which provide different search characteristics to the hybrid algorithm. Table 7.1: Set of GA techniques for Rastrigin and Griewank’s functions UCUM

Selector Initializer Crossover Mutator Crossover Rate Mutation Rate

BCUM UCGM BCGM Roulette Wheel Uniform Uniform BLX-α Uniform BLX-α Uniform Gaussian 90% 1%

Each standalone algorithm has been tested in the two considered functions, as well as three hybrid configurations: one with constant participation ratios (the participation is uniformly distributed at the beginning of the execution and it does not change through all the process) and two with dynamic participation adjustment (one per quality measure). For each configuration and function, 20 independent runs of the algorithm were executed with the parameters shown in Table 7.2. Most of the parameters are classic in the literature, whereas the remaining (minimum participation ratio, for example) have been adjusted according to previous experimentations. Table 7.2: Common configuration for all the problems Rastrigin Executions Problem Dimensions Population Size Convergence Criterion Elitism Min. Participation

Griewank 20 150 Dimensions 100 Pop. Convergence at 99% 100% Elitism 5%

All the fitness values have been linearized within the interval [0, 1] to make the comparison of the results and the graphical representation easier. A fitness value of one means that the algorithm found the optimal value.

O BJECTIVES: • Study the differences in the adjustment of the participation of two different hybridization strategies (Quality Functions in this case). • Analyze how synergies among multiple techniques arise.



7.1. BEHAVIORAL ANALYSIS

7.1.1

131

Results in Rastrigin’s Function

The results in Rastrigin’s function are summarized in Table 7.3. In this problem, the NSC measure obtains the best results, whereas the Fitness Average measure presents a reasonable good performance, considerably better than the best single technique, but a bit worse than the NSC. The configuration with constant participation ratios obtains good results if compared with standalone algorithms, but worse than those of the adaptive approaches. Table 7.3: Comparison of the results in Rastrigin’s function Algorithm UCUM BCUM UCGM BCGM MOS + Constant Participation MOS + Fitness Average MOS + NSC

Mean 0.733150 0.003281 0.056696 0.002410 0.850862 0.908572 0.975401

Std. Dev. 0.041998 0.002615 0.013096 0.000521 0.184398 0.053230 0.009846

Figures 7.1a and 7.1b plot the participation adjustment carried out by the fAvg and the NSC Quality Measures, respectively. in Rastrigin’s function. If we compare both figures, we can find some interesting similarities. Techniques UCUM and UCGM dominate during the first part of the execution. At some point, UCGM stops producing good solutions and then BCUM takes over and increases its participation. However, there are also some differences between both figures. The changes in the trend of the dynamic adjustment of the participation are much more abrupt in the Fitness Average measure than in the NSC measure. There is also a more important contribution from minority techniques when using NSC than in the other case. Both things make NSC more suitable for this problem. It is important to note that, with both quality measures, at least one technique that had reached the minimum participation ratio gets its participation increased in a later phase of the algorithm. If a minimum participation ratio would not have been established, this would not have been possible. For this reason, a further analysis on the influence of this minimum participation ratio should be carried out, in order to determine reasonable values for common problems.

7.1.2

Results in Griewank’s Function

Table 7.4 presents the results obtained on Griewank’s function. In this case, the best performing quality measure is, by far, the Fitness Average, obtaining results twice as much accurate than those of the NSC and the best individual technique. On the other hand, the NSC quality measure slightly improves the results of the best individual technique, but remains quite far from the global optimum. In the middle of both hybrid approaches, the configuration of MOS with constant participation obtains results slightly better than those of the NSC, but much worse than those obtained by the configuration using the Fitness Average quality measure. Antonio LaTorre de la Fuente


132



UCUM BCUM UCGM BCGM

0.5

0.25

UCUM BCUM UCGM BCGM

0.75 Participation

0.75 Participation


0.5

0.25

0

0 0

100 200 300 400 500 600 700 800 900 Generation

0

(a) fAvg Quality Measure

100 200 300 400 500 600 700 800 900 Generation

(b) NSC Quality Measure

Figure 7.1: Dynamic adjustment of the Participation for Rastrigin’s function and both Quality Measures Table 7.4: Comparison of the results in Griewank’s function Algorithm UCUM BCUM UCGM BCGM MOS + Constant Part. MOS + Fitness Average MOS + NSC

Mean 0.350935 0.109647 0.025335 0.440243 0.494592 0.972402 0.466626

Std. Dev. 0.138844 0.025171 0.009565 0.012195 0.006945 0.004569 0.010867

Figures 7.3a and 7.3b plot the participation adjustment carried out by the fAvg and the NSC Quality Measures, respectively, in Griewank’s function, whereas Figure 7.2 depicts the evolution of the average fitness value for this function. We have considered important to include this picture as it is a good example of how the combination of different evolutionary approaches can lead to better results than used individually. In this picture, we can clear distinguish two different phases in the search process that, as we can see in Figure 7.3a, are guided by two different techniques. The first phase is dominated by the technique BCGM, and takes place from generation 0 to around generation 500. Between generations 500 and 600, there exists an adjustment phase in which the participation of the technique BCGM is decreased whilst the participation of the technique UCUM starts to be increased. It is from generation 600 when UCUM seems to acquire more participation and, even though it does not surpass the technique BCGM, it seems clear that this gain in participation has a deep influence on the relaunch of the search process. On the other hand, the NSC measure is not able to deal with the particularities of this function in the same way the Fitness Average does (Figure 7.3b). From the quality values reported by the NSC measure, none of the techniques seems to have any advantage in solving this function and thus the NSC is unable to decide which one should lead the search. There are continuous fluctuations in the participation ratios, but they remain A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


7.1. BEHAVIORAL ANALYSIS

133

mainly constant around the initial values. However, the final fitness value is slightly better than that of the best individual technique, even though this technique is only generating the 25% of the overall population. This can be due to the fact that individual techniques suffer from premature convergence and, for this reason, the number of total evaluations for this technique is similar to those carried out when it is hybridized within MOS, as the hybrid algorithm takes longer to converge. Convergence Graph 1

Fitness

0.8 0.6 0.4 0.2 0 0

200

400 600 Generation

800

1000

Figure 7.2: Evolution of the fitness value for Griewank’s function and the Fitness Average Quality Measure


UCUM BCUM UCGM BCGM

UCUM BCUM UCGM BCGM

0.75 Participation

0.75 Participation


0.5

0.25

0.5

0.25

0

0 0

200

400 600 Generation

(a) fAvg Quality Measure

800

1000

0

100 200 300 400 500 600 700 800 900 Generation

(b) NSC Quality Measure

Figure 7.3: Dynamic adjustment of the Participation for Griewank’s function and both Quality Measures

7.1.3

Validation

To validate these results, a non-parametric Wilcox test has been carried out on every pair of algorithms for each problem. Table 7.5 summarizes these results. For each problem, the algorithms have been arranged in levels. Algorithms in a higher level are statistically better than algorithms in lower levels with a level of significance of α = 0.05. For the algorithms in the same level, the Wilcox test could not determine which one Antonio LaTorre de la Fuente


134


is better (they are not sorted in any particular way). It is important to note that the three hybrid approaches are always at the top level for both functions, and that on each problem, one of the dynamically adjusted algorithms obtains the best results, with significant difference if compared with the constant approach. This means that in complex scenarios it is important not only the combination of Evolutionary Algorithms but also how the participation of each algorithm is adjusted. Table 7.5: Results of the Wilcox test with a significance level α = 0.05 Rastrigin NSC fAvg constant UCUM UCGM BCUM

7.2

BCGM

Griewank fAvg constant NSC BCGM UCUM BCUM UCGM

Computational Analysis

In this section, a comparative analysis of the performance of the hybrid algorithms developed with the MOS framework will be conducted. For this study, the CEC 2005 Benchmark for continuous optimization will be used, as it is made up of 25 functions of different computation complexity, which will allow a complete analysis of the impact of the hybridization in different situations. For this study, the same configuration and techniques described in Section 6.2.2 will be used. The execution time of each individual algorithm has been recorded an can be found in Table 7.6. Table 7.6: Average performance of the four individual algorithms

BCGM UCUM DE Bin DE Exp

10 dimensions 30.23 sec 29.12 sec 30.30 sec 30.31 sec

30 dimensions 288.95 sec 287.26 sec 290.02 sec 288.80 sec

The same seven hybrid configurations considered in the previous experimentation have also been used (Table 6.3). For each of these configurations, the execution time for each function has been recorded and averaged. Table 7.7 presents a condensed version of this information. For each hybrid configuration, the average execution time for the 25 functions that made up the benchmark is reported, as well the overhead of these hybrid configurations compared with the best and the worst individual algorithms in terms of execution time (UCUM and DE Bin, respectively). A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


7.2. COMPUTATIONAL ANALYSIS

135

Table 7.7: Average performance of the hybrid algorithms compared with. Best and Worst individual algorithms

fAvg + Dyn PF1 fAvg + Dyn PF2 NSC + Dyn PF1 NSC + Dyn PF2 Constant PF Self-Adaptive Weighted Self-Adapt.

Avg. time 30.29 sec 120.05 sec 33.91 sec 140.22 sec 32.70 sec 29.55 sec 29.50 sec

10 dimensions vs. Best +4.02% +312.27% +16.43% +381.52% +12.30% +1.46% +1.32%

vs. Worst −0.08% +296.03% +11.84% +362.55% +7.88% −2.54% −2.67%

Avg. time 288.53 sec 1152.21 sec 300.48 sec 1208.30 sec 294.33 sec 287.78 sec 287.98 sec

30 dimensions vs. Best vs. Worst +0.44% −0.52% +301.11% +297.28% +4.60% +3.61% +320.63% +316.62% +2.46% +1.48% +0.18% −0.77% +0.25% −0.71%

From this table, the following conclusions can be extracted: • The Dynamic PF1 Participation Function introduces a minimal overhead. For the Fitness Average Quality Measure, the penalty was of 4% in the worst case (if compared with the quickest individual algorithm in 10 dimension)s. For the remaining scenarios (the slowest individual algorithm in 10 dimensions and both individual algorithms in 30 dimensions) the overhead was negligible. • The NSC Quality Measure has a slightly greater impact in the performance of the algorithm. In 10 dimensions, the overhead can reach a 16% ( with the Dynamic PF1), whereas it remains around 4 − 5% in 30 dimensions, a value much tighter. This behavior seems to be reasonable, as the impact is a bit bigger when the complexity of the function is smaller (low dimensionality) and it gets reduced as the problem size is increased: the cost of creating and managing the structures needed to compute the NSC value is more perceptible when the evaluation of the fitness function is smaller. • Regarding the Dynamic PF2 Participation Function, the overhead is around 300 − 400%, which means that the hybrid algorithm runs four times more slowly than its individual counterparts. This could seem to be a great penalty in the performance of the algorithm but, however, it can be easily justified. As it can be seen in Section 4.2.1.1.5, the Dynamic PF2 strategy has the ability to reduce the overall population size. For this benchmark, after a few generations at the beginning of the algorithm, the techniques which are performing the worst get their participation reduced (and even extinguished) and the overall population size is adjusted to a value of 25 − 35% of its original size, which makes the algorithm to run 3 − 4 times longer. • The Constant Participation Function obtains an average performance between the fAvg and the NSC Quality Measures. • Both Self-Adaptive configurations present a similar performance, very close to that exhibited by the individual techniques. This is normal, as there are not extra regulatory mechanisms which can introduce Antonio LaTorre de la Fuente


136


great penalty to the algorithm (just the participation recombination function, which computes an average value of a reduced set of values). To conclude this study, Tables 7.8 and 7.9 presents a detailed breakdown of the best (in terms of execution time) and worst hybrid configurations, which will allow a fine-grained analysis of this information. Table 7.8: Detailed comparison of fAvg + Dyn PF1 configuration with Best and Worst individual algorithms

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 Avg.

10 dimensions Avg. time vs. Best vs. Worst 0.54 sec +4.26% +0.27% 0.60 sec +18.49% +6.38% 0.76 sec +16.67% +6.56% 0.62 sec +10.65% +6.23% 0.46 sec +5.74% +2.40% 0.59 sec +33.57% +8.26% 0.67 sec +25.50% +4.70% 0.70 sec +32.87% +6.73% 0.57 sec +19.86% −5.15% 0.64 sec +17.29% +0.54% 17.41 sec +4.08% −0.37% 2.15 sec +12.04% −10.43% 0.67 sec +31.41% +3.90% 0.71 sec +31.87% +7.32% 73.10 sec +3.86% −0.17% 71.68 sec +3.57% −0.12% 71.79 sec +3.65% −0.05% 72.10 sec +3.78% −0.04% 72.16 sec +3.85% −0.02% 72.21 sec +4.00% −0.14% 72.04 sec +3.93% −0.20% 73.08 sec +3.97% −0.30% 72.09 sec +4.03% −0.06% 39.79 sec +4.37% +0.31% 40.14 sec +3.31% +0.88% 30.29 sec +4.02% −0.05%

30 dimensions Avg. time vs. Best 5.04 sec −5.13% 5.44 sec −2.10% 6.81 sec +3.33% 5.58 sec +10.74% 3.16 sec +20.12% 5.22 sec +10.59% 5.63 sec +0.64% 6.04 sec +11.04% 5.30 sec +5.56% 5.70 sec +5.95% 155.79 sec +1.18% 48.17 sec +6.76% 5.90 sec +8.67% 6.23 sec +10.74% 692.29 sec +0.58% 677.75 sec +0.43% 679.06 sec +0.45% 681.48 sec +0.53% 682.88 sec +0.57% 681.99 sec +0.31% 683.55 sec +0.38% 693.29 sec +0.30% 682.54 sec +0.23% 393.31 sec +0.32% 395.10 sec −0.96% 288.53 sec +0.44%

vs. Worst −6.17% −2.03% −27.33% +1.79% +2.66% +0.52% −7.58% +0.36% −9.38% −3.70% −0.01% −19.80% −3.32% +0.99% −0.37% −0.59% −0.33% −0.30% −0.21% −0.22% −0.15% −0.36% −0.35% −0.48% +0.09% −0.52%

Despite the different order of magnitude of the values reported for each configuration, a similar behavior can be observed in both cases. On the one hand, the first 14 functions present a greater overhead than the remaining 11 functions (functions F15 to F25). This division coincides perfectly with the groups in which the benchmark is divided. The first 14 functions correspond with the unimodal and basic multimodal functions of the benchmark. The remaining functions correspond with the expanded and composition groups of functions. This confirms one of the aforementioned conclusions: the impact of the hybridization can be more clearly appreciated when the complexity of the problem is low. On the other hand, it can also be observed how this impact gets reduced as the dimensionality of the functions is increased. However, the division in two groups A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


7.2. COMPUTATIONAL ANALYSIS

137

remains the same, which seems reasonable as the number of dimensions has been increased for both groups of functions. Table 7.9: Detailed comparison of NSC + Dyn PF2 configuration with Best and Worst individual algorithms

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 Avg.


10 dimensions Avg. time vs. Best 3.32 sec 546.74% 3.43 sec 580.49% 4.28 sec 553.56% 3.52 sec 530.06% 2.89 sec 564.19% 14.12 sec 3096.95% 11.25 sec 2018.77% 12.01 sec 2183.32% 11.27 sec 2277.73% 11.65 sec 2031.67% 82.16 sec 391.20% 21.03 sec 993.84% 14.27 sec 2681.10% 17.67 sec 3185.66% 344.46 sec 389.39% 328.83 sec 375.11% 326.95 sec 372.07% 336.55 sec 384.42% 332.70 sec 378.79% 335.64 sec 383.43% 361.88 sec 422.04% 298.40 sec 324.55% 296.00 sec 327.17% 167.05 sec 338.24% 164.14 sec 322.44% 140.22 sec 381.52%

vs. Worst 522.00% 510.95% 496.92% 504.85% 543.20% 2491.21% 1667.61% 1734.11% 1781.58% 1727.18% 370.23% 774.43% 2098.74% 2573.95% 370.39% 358.23% 355.21% 366.61% 360.93% 364.16% 401.33% 307.12% 310.35% 321.17% 312.49% 362.71%

30 dimensions Avg. time vs. Best 24.72 sec 365.03% 26.46 sec 376.21% 32.50 sec 393.00% 26.82 sec 431.85% 16.25 sec 517.72% 25.58 sec 441.74% 27.47 sec 391.30% 28.30 sec 420.41% 26.71 sec 431.72% 28.63 sec 431.83% 634.13 sec 311.85% 216.33 sec 379.48% 28.73 sec 429.50% 30.42 sec 441.10% 2905.49 sec 322.13% 2805.45 sec 315.72% 2764.45 sec 308.95% 2873.06 sec 323.83% 2931.96 sec 331.80% 2928.12 sec 330.67% 2903.29 sec 326.37% 2861.79 sec 314.04% 2829.77 sec 315.54% 1608.98 sec 310.38% 1622.11 sec 306.62% 1208.30 sec 320.63%

vs. Worst 359.94% 376.55% 246.74% 388.88% 427.95% 392.42% 351.15% 370.34% 356.42% 383.41% 307.02% 260.17% 371.08% 393.42% 318.14% 311.51% 305.75% 320.33% 328.43% 328.41% 324.09% 311.29% 313.15% 307.12% 310.95% 316.62%


138




Chapter 8

Learning Hybridization Strategies

8.1

Introduction

In previous chapters, a framework to design Hybrid Evolutionary Algorithms, Multiple Offspring Sampling, has been proposed and successfully tested. With MOS the combined algorithms have their participation ratio adjusted in a per generation basis. However, all the executions are independent in the sense that the participation adjustment knowledge acquired by the algorithm is not used in successive executions. The use of this information could be very interesting in scenarios in which a Hybrid EA is continuously executed to solve the same problem with slightly different data. This chapter proposes a different approach to deal with the assignment of the offspring sampling quota within MOS. This approach considers the adjustment of this quota as a decision problem that can be solved using a long-term strategy. The rationale behind this idea is inspired by the objective not to maximize the quality (fitness) of the best individual in a given generation but the quality (fitness) of the best individual in the last generation. Owing to this approach, the reward obtained by this long-term situation should be propagated back to the decision taken at a given point during the evolutionary process. For this purpose, Reinforcement Learning (RL) mechanisms have been added to the MOS algorithm in order to provide hybrid algorithms with the tools necessary to learn optimal long-term strategies for combining the different available techniques. Reinforcement Learning is a powerful Artificial Intelligence (AI) technique to construct best-response strategies for an agent interacting within an environment. As a Machine Learning topic, RL deals with the identification of the sequence of actions an agent has to carry out in order to maximize a reward function. RL has been successfully applied to different domains, from classic games to control systems. It should be taken into account that the problem of parametrization of a heuristic algorithm is a complex Antonio LaTorre de la Fuente


140

CHAPTER 8. LEARNING HYBRIDIZATION STRATEGIES

scenario for RL because actions, decided by the strategies, perform stochastically. This stochastic behavior comes from both randomness in single individual generation and survival of the fittest individuals depending on the selection schema (and the fitness of the remaining individuals in the population). Consequently, this study requires the application of RL techniques that are able to deal with stochastic reward functions, like those used for stochastic games, such as Win or Learn Fast (WoLF) [BV01b], Policy Hill Climbing (PHC) [BV01b] or Tentative Exploration by Restricted Stochastic Quota (TERSQ) [PLPO09]. The goal of this proposal is two-fold: (i) improving the results obtained by MOS deciding the sequence of actions that performs the best when looking for the long-term objective, and (ii) validating whether RL techniques are able to learn the best-response mixed strategy for a complex scenario in which the reward function is stochastically determined.

8.2

Related Work on Reinforcement Learning

In [KLM96], the authors define Reinforcement Learning (RL) as “The problem faced by an agent that must learn behavior through trial-and-error interactions with a dynamic environment”. In the standard Reinforcement Learning model, the agent is connected to its environment in a bidirectional way. On the one hand, the agent receives an input, via perception, and some information about the state of the environment. On the other hand, the agent interacts with its environment by carrying out an action than can potentially change the state of the environment. A reinforcement signal is associated to each action and the behavior of the agent should choose actions maximizing the long-term sum of the reinforcement signal. This behavior can be learnt if a systematic trial-and-error search is guided by the appropriate algorithm. Formally, the Reinforcement Learning model is made up of: • A discrete set of states, S. • A discrete set of actions, A. • A set of reinforcement signals (typically 0, 1 or real numbers). An important issue in Reinforcement Learning is how the agent will take the future into account. There are basically three models that try to optimize the reward in different moments. The finite-horizon model tries to optimize the reward in the following h steps. The infinite-horizon model considers the attenuated long-term rewards as if they were an interest rate. Finally, the average-reward model takes into account the long-term average reward. Problems with delayed reinforcement, such as Reinforcement Learning, are well modeled as Markov Decision Processes (MDPs). An MDP is defined as a tuple (S, A, T, R), where S is the set of states, A is the action set of the agent and T is the transition function. As MDPs have non-deterministic state transitions, T is defined as S × A × S → [0, 1]. This function determines the probability to carry out a state transition from one A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


8.2. RELATED WORK ON REINFORCEMENT LEARNING

141

state to another after executing a particular action. R is the reward function of the agent, and it is defined as a probability distribution: S × A → P D(R). There are two alternatives for obtaining an optimal policy for an MDP. First, a controller can be learnt without learning a model (Model-free approach). Second, a model can be learnt and, then, a controller can be derived from it (Model-based approach). In this contribution more attention will be paid to the Model-free approach and, especially, to a family of algorithms called Q-learning. In Q-learning a matrix of Q-values is maintained. These values are the expected discounted reinforcement of taking action a in state s. This matrix is initialized to zero and its values are updated by means of the typical Q-learning rule: 0 0 Q(s, a) = Q(s, a) + α r + γ max Q(s , a ) − Q(s, a) 0 a

where {s, a, r, s0 } is an experience tuple summarizing a single transition in the environment, α is the learning rate and γ is the discount factor. In this tuple, s represents the state of the agent before the transition, a the action carried out, r the instantaneous reward it receives and s0 the resulting state. This algorithm was proposed by Watkins in [Wat89] and proved to converge to the optimal policy Q∗ with probability 1 if each action is executed in each state an infinite number of times in infinite runs and if α is decayed appropriately [JJS94]. Some variants of these algorithms have been successfully applied to MDPs. PHC and WoLF [BV02] are extensions to the Q-learning algorithm particularly designed to deal with stochastic scenarios. Both approaches maintain a learning rate in the form of a selection probability for each action-state pair. The main difference is that, in PHC, the learning rate is constant whereas WoLF changes this value whether it is winning or losing, following the idea of learning fast (higher rates) when losing and learning at a slower rate when winning. In [BV01a], the authors extend the WoLF algorithm to incorporate the concept of Infinitesimal Gradient Ascent (IGA) presented by [SKM00] to define the “winning” situations required to update the learning rate in WoLF. GIGA-WoLF [Bow05] is an extension of the latter considering the concept of Generalized IGA [Zin03]. BL-WoLF [CS03] is an enhanced version of WoLF that provides a bounded-loss where the cost of learning is measured by the losses suffered by the learning agent (rather than the number of rounds). Another variant is Hyper-Q [Tes03], in which values of mixed strategies rather than base actions are learnt, and in which other agents’ strategies are estimated from observed actions via Bayesian inference. Weighted Policy Learner (WPL) [AL08] is a new RL algorithm which does not assume any knowledge of the underlying game structure. A new Q-learning variant, called TERSQ, was presented in [PLPO09]. The main idea underlying this algorithm is the use of an overall stochastic quota, σ, in order to select the action to be executed. The action with the best Q-value will be selected with a probability σ, whereas the remaining actions are stochastically selected with a probability of 1 − σ according to their Q-value ranking. It is important to mention that few works have used RL as a regulatory mechanism for metaheuristics. In [Nar04], the authors propose a hyper-heuristic in which the basic heuristics are selected by a procedure inspired Antonio LaTorre de la Fuente


142


Algorithm 7 Multiple Offspring Sampling with RL Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12:

Initialize Matrix of Q-values (Q(s, a) = 0) Create initial overall population of candidate solutions P0 Evaluate initial population P0 while termination criterion not reached do if maximum number of generations per state reached then Move from state s to s0 end if while |Oi | < |Pi | do Apply Learning Policy to create a new individual and add it to the offspring population Oi end while Combine populations Oi and Pi according to a pre-established criterion to generate Pi+1 end while

by Reinforcement Learning. In [EHKS07], the authors used RL techniques for the online control of some of the parameters of a Steady State Genetic Algorithm, which slightly improves the performance of the standard algorithm. However, none of these papers have explored the ability of RL to control the behavior of Hybrid Evolutionary Algorithms, which is the goal of this work.

8.3

Reinforcement Learning to Control MOS Strategies

In the previous sections, we have envisaged the possibility of using RL to control the behavior of MOS. In this section, the MOS algorithm will be extended to make use of the specific RL mechanisms reviewed in Section 8.2. Each action in the MOSRL algorithm (MOS with RL extensions) is the creation of new offspring by means of one of the available reproductive techniques. The set of available states has been established discretizing the participation of each reproductive technique into eleven possible values ({0.0, 0.1, . . . , 1.0}) with the only constraint being that the sum of the participation of each technique must be equal to 1. In addition, to favor the exploration, a maximum number of generations are allowed for each state. Every N generations a state transition is automatically carried out to a new state in which the discrete participation ratios remain unchanged and the generation information is updated.

State = (Φ × G) (1)

(n)

Φ = (ϕj , . . . , ϕj ) ∈ {0.0, 0.1, . . . , 0.9, 1.0}n /

(8.1) n X

(i)

ϕj = 1.0

(8.2)

i=1

G ∈ {[k · N, (k + 1) · N ) : k = 0 . . . M }

(8.3)

For example, if a hybrid algorithm with four techniques is being run, and its current state is represented by the tuple {parts = {0.1, 0.3, 0.4, 0.2}, gen ∈ [100, 200)}, then when the algorithm arrives to generation 200, a new state transition is carried out to the state {parts = {0.1, 0.3, 0.4, 0.2}, gen ∈ [200, 300)}. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


8.3. REINFORCEMENT LEARNING TO CONTROL MOS STRATEGIES

143

Figure 8.1 shows an example of a matrix of states for a configuration in which two techniques are being used simultaneously. The participation of the techniques has been discretized at each 10% and a state transition is forced every 100 generations. Participation (1.0, 0.0) (0.9, 0.1) (0.8, 0.2) (0.7, 0.3) (0.6, 0.4) (0.5, 0.5) (0.4, 0.6) (0.3, 0.7) (0.2, 0.8) (0.1, 0.9) (1.0, 0.0) [1, 100)

Generations [100, 200)

[200, 300)

[300, 400)

[400, 500)

Figure 8.1: Example of matrix of states with two techniques Let (n, T , P, O) be a MOS system, as seen in Section 4.1.2, and let (S, A, T, R) be the tuple that describes an MDP. We say that this MDP is an MOS control strategy where:

S = {si : ∃j ∈ [0, m]/si = sp(j, Pj )}

(8.4)

A = {ai : i ∈ [1, n]}

(8.5)

T = {(si , aj , sk , πi,j,k )}

(8.6)

R = {(si , aj , ri,j )}

(8.7)

In the previous equations, sp is a state projection function that maps all the possible population configurations into a set of MDP states (Equations 8.8, 8.9 and 8.10), and ndval is a function that approximates a real participation value to its nearest discrete value in the set of possible values{0.0, 0.1, . . . , 1.0}.

sp(j, Pj ) = (Φ, G) (1) (n) (i) / ϕj = ndval Φ = ϕj , . . . , ϕj

G = [k · N, (k + 1) · N ) / k =


(8.8) (n) |Pj | Pn (i) i=1 |Pj |

jjk N

! (8.9) (8.10)


144


In this context, each a represents the action “create a new individual using the offspring mechanisms of technique i”. State transitions (si , aj , ak , πi,j,k ) express that there is a probability πi,j,k of changing from state si to state sk once a new individual has been created with the technique j, i.e., once the action aj has been executed. This probability depends on (i) the quality of the new individual, (ii) the selection pressure also based on the quality of other individuals, and (iii) the state projection function (multiple populations are mapped to the same states). On the other hand, the elements in the reward function R, (si , aj , ri,j ), can be seen as “the immediate reward obtained from creating a new individual with technique j in state si ”. This reward value ri,j can be expressed as the improvement in the quality of the existing population derived from the contribution of the newly created individual. Finally, for this work, three learning policies have been considered (PHC, WoLF and TERSQ) which are variations of the state-of-the-art Q-learning algorithm that can work in stochastic domains. The first two policies are classic in the literature of stochastic games, whereas the third one was developed specifically for this work (although it has been also successfully applied to other problems, as seen in [PLPO09]). The Algorithm 7 provides a detailed description of the common parts of the proposed algorithm, whereas a more thorough description of each of the three policies used in this experimentation will be offered in Sections 8.3.1, 8.3.2 and 8.3.3, respectively.

8.3.1

PHC Learning Policy

Policy Hill Climbing (PHC) was proposed in [BV01b] as an extension to the standard Q-learning algorithm. The Q-values are maintained as usually but, in addition, the algorithm maintains also the current mixed policy (π(s, a)). This policy controls the probability of selecting a given action during the learning phase. It is updated after the execution of an action by increasing the probability of selecting the best performing action according to a learning rate δ, which takes a constant value within the interval (0, 1], or by decreasing the probability of the executed action if it is not the best performing one. A more detailed description of this policy is provided by Algorithm 8.

8.3.2

WoLF Learning Policy

The WoLF policy was proposed in [BV01b] as an extension of the PHC policy reviewed in the previous section. The basic idea is to modify the learning rate used dynamically to encourage convergence without sacrificing rationality. Intuitively, the algorithm tries to learn quickly when it is losing and more slowly when it is winning. To determine whether the algorithm is winning or losing, the current policy’s payoff is compared with that of the average policy over time. For this purpose, the algorithm requires two learning parameters: δl , that will be used when the algorithm is losing, and δw , that will be used when the algorithm is winning, with δl > δw . The Algorithm 9 provides a more detailed description of this policy. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


8.4. EXPERIMENTAL SCENARIO

145

Algorithm 8 PHC Learning Policy 1:

Let α and δ be learning rates, Initialize Q(s, a) ← 0,

2: 3: 4:

π(s, a) ←

1 |A|

while current generation not finished do From state s select action a with probability π(s, a) Q-values are updated observing reward r and next state s0 , 0 0 Q(s, a) = Q(s, a) + α r + γ max Q(s , a ) − Q(s, a) 0 a

Update π(s, a) and constrain it to a legal probability distribution  δ if a = argmax(Q(s, a0 )) a0 π(s, a) ← π(s, a) +  −δ otherwise |A|−1

5:

6:

end while

8.3.3

TERSQ Learning Policy

In [PLPO09] the TERSQ algorithm was introduced. The main idea of this algorithm is to use an overall stochastic quota, σ, in order to select the action to be executed. A binomial decision process is carried out in such a way that actions with best Q-values are selected with a probability of σ whereas the remaining actions are stochastically selected with a probability of 1 − σ according to their Q-value ranking. The σ value is selected for each round based on three different criteria. From these criteria, three phases can be established: (1) the Tentative Phase in which the algorithm tries all the possible σ values (from a finite set of values, named Γ) to get an initial estimation of the performance of every possible σ value, (2) the σ Adjustment Phase in which σ values are proportionally chosen according to their average performance τ (σ) (which is updated at the end of each round), and (3) the Optimal σ Phase in which the σ value with highest average performance is selected for the remaining learning process. The usual Q-learning technique is applied throughout the process. A detailed description of this policy can be found in Algorithm 10.

8.4

Experimental Scenario

For the experiments carried out in this work, the benchmark proposed for the CEC’08 Special Session and Competition on Large Scale Global Optimization [TYS+ 07] has been considered. This benchmark is made up of six scalable continuous functions with some of the characteristics that make these type of functions hard to be solved: multimodality, non-separability, shifted global optimum, etc. Specifically, the six considered functions are: Ackley, Griewank, Rastrigin, Rosenbrock, Schwefel and Sphere. The first part of this section details the configuration used for the different tested approaches (single GAs, Antonio LaTorre de la Fuente


146


Algorithm 9 WoLF Learning Policy 1:

Let α, δl > δw be learning rates, Initialize Q(s, a) ← 0,

2: 3: 4:

1 , |A|

π(s, a) ←

C(s) ← 0

while current generation not finished do From state s select action a with probability π(s, a) Q-values are updated observing reward r and next state s0 , 0 0 Q(s, a) = Q(s, a) + α r + γ max Q(s , a ) − Q(s, a) 0 a

Update estimate of average policy, π ¯,

5:

C(s) ← C(s) + 1 ∀ a0 ∈ A

π ¯ (s, a0 ) ← π ¯ (s, a0 ) +

1 (π(s, a0 ) − π ¯ (s, a0 )) C(s)

Update π(s, a) and constrain it to a legal probability distribution  δ if a = argmax(Q(s, a0 )) a0 π(s, a) ← π(s, a) +  −δ otherwise |A|−1

6:

where, ( δ= 7:

δw δl

P P if a π(s, a)Q(s, a) > a π ¯ (s, a)Q(s, a) otherwise

end while

standard MOS algorithm and MOS algorithm combined with RL techniques), whereas the second part of this section presents the experimental procedure carried out.

8.4.1

Algorithms

Table 8.1 presents the configuration used by each of the evolutionary approaches considered in this experimentation (single GAs, standard MOS and MOS with RL techniques). The number of executions has been set so that the RL algorithms have enough information to learn optimal hybridization strategies (111 runs and 2.5M evaluations per execution). The convergence criterion has been fixed to a maximum number of evaluations as proposed in [TYS+ 07]. No minimum participation ratio has been imposed to the MOS algorithm because, as we will see in Section 8.5, two of the proposed techniques have a performance much worse than the other two, which makes undesirable to waste computational efforts in maintaining a marginal participation of these algorithms. The complexity of the considered functions has been established to 500 dimensions, a considerably large size which makes these functions quite difficult to be solved. Table 8.2 presents the set of techniques used by the hybrid configurations. This set of techniques has been constructed by combining two crossover and two mutation operators classic in the literature for continuous A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


8.4. EXPERIMENTAL SCENARIO

147

Algorithm 10 TERSQ Learning Policy 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:

Let A be the set of possible actions for the state s, and a ∈ A one action for this state, α, γ the learning parameters, σ ∈ Γ = {0.0, 0.1, . . . , 1.0} the overall quota used to select Amax , τ (σ) the average performance of σ σ be selected from Γ following the specific criteria of the current phase. Initialize Q(s, a) ← 0 while current generation not finished do for each action a on each state s do Compute π(s, a), a basic probability obtained by a ranking process where actions are sorted according to their Q-values in an increasing order: {A0i } = sort({a}) X ∀ni=1 π(s, {A0i }) = i × π0 / π(s, {A0i }) = 1

11: 12:

(8.11) (8.12)

end for These probabilities are adjusted by the σ quota as follows, π(s, a) = π(s, a) × (1 − σ),

a 6= Amax

(8.13)

and for the Amax (action with the best actual Q-value) π(s, Amax ) = (π(s, Amax ) × (1 − σ)) + σ 13: 14:

(8.14)

Select action a with probability π(s, a). Q-values are updated observing reward r and next state s0 , Q(s, a) = Q(s, a) + α r + γ max Q(s0 , a0 ) − Q(s, a) 0 a

15: 16:

end while Update the τ (σ) according to the evaluation of the round. Table 8.1: Algorithm configuration GAs Executions Problem Dimensions Population Size Convergence Criterion Elitism Minimum Participation Participation Function Quality Measure 1 2

-

Standard MOS 111 500 dimensions 50 2,500,000 FEs1 Full Elitism 0% Dynamic2 Constant Fitness Average -

MOSRL

-

Fitness Evaluations, as specified in [TYS+ 07] Dynamic PF1, with constant population size

optimization. Many researchers have considered the combination of the BLX-α Crossover and the Gaussian Mutator but, as we will see in Section 8.5, the combination of the BLX-α Crossover with the Uniform Mutator Antonio LaTorre de la Fuente


148


reports better results. Table 8.2: Set of techniques for the hybrid evolutionary algorithm BCUM

Selector Initializer Crossover Mutator Crossover Rate Mutation Rate 1

UCUM BCGM UCGM Roulette-Wheel Uniform BLX-α1 Uniform BLX-α1 Uniform Uniform Gaussian 90% 1%

BLX-α with α = 0.5

Table 8.3 shows the values for the parameters needed by the different RL policies. The learning rate (α) and the discount factor (γ) common to the three Q-learning based policies have been selected following the guidelines provided in [EDM04]. For the learning rates specific to PHC and WoLF (δ, δl and δw ), the recommended values from [BV01b] have been used. The σ values needed by the TERSQ policy are adjusted by the policy itself, as it was explained in Algorithm 10, and do not need to be fixed. Table 8.3: Parameters of the RL policies PHC α γ δ δl δw

8.4.2

0.01 -

WoLF 0.7 0.6

TERSQ

0.02 0.005

-

Procedure

For each of the proposed problems, the following experimental procedure is carried out: • Each evolutionary technique is executed individually. • The four proposed evolutionary techniques are combined within MOS with both constant and dynamic participation functions and the configuration presented in Table 8.1. • The four proposed evolutionary techniques are combined within the RL version of MOS with the configuration presented in Table 8.1. • For the TERSQ algorithm, 11 rounds are carried out in the Tentative Phase, 50 rounds in the σ Adjustment Phase and 50 rounds in the σ Optimal phase. For the other two algorithms, PHC and WoLF, 111 rounds are executed. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


8.5. RESULTS AND DISCUSSION

149

• For the non-RL algorithms (single GAs and standard MOS configurations) 111 independent runs have been executed. • The average error in comparison with the global optimum is reported for each problem and configuration. • A ranking analysis based on the fitness of each algorithm is carried out. • The results obtained by each algorithm are pair-wise compared using a Wilcoxon non-parametric t-test against the others. • The nWins procedure described in Section A.2 has been used to validate the results. • The evolution of the participation of the different techniques in the hybrid algorithms (both standard MOS and MOSRL) is analyzed to study how different search strategies can boost the performance of individual techniques and how this participation evolves in the case of the RL algorithms.

O BJECTIVES: • Test the ability of MOS with Reinforcement Learning mechanisms to learn optimal hybridization strategies. • Compare the three learning strategies on different scenarios. • Compare the MOS algorithm with RL with the traditional version of MOS (with both constant and dynamic PFs.

8.5

Results and Discussion

Table 8.4 presents the average error obtained by each of the configurations in the six proposed problems. For each column, the smallest error is shown in bold. From these data, it can be observed that one of the RL policies, PHC, obtains the best results in 4 out of the 6 considered functions. In the other two functions, one of the single GAs, the BCUM configuration (BLX-α Crossover + Uniform Mutation), reports the lowest average error. The differences in performance between PHC and the best single GA ranges from 4% to 40% for the four functions in which it obtains lower average error and from 1% to 8% in the other two functions. The other RL policies (WoLF and TERSQ) report worse average errors as a result of their more conservative behavior, as will be seen when the participation plots are analyzed. It is interesting to note that at least two of the single GAs report average errors with the same order of magnitude for the four functions where the PHC policy obtains the best results. In the other two functions, one of the single GAs obtains significantly better results than the others, with an average error at least one order of magnitude lower than the remaining GAs. This can explain that, even if the hybrid approaches are able to Antonio LaTorre de la Fuente


150


Table 8.4: Average error in the six proposed functions when the four reproductive techniques are considered.

BCGM BCUM UCGM UCUM MOS Const MOS Dyn PHC WoLF TERSQ Tentative Adjustment Optimal

Ackley 8.80E+00 3.17E+00 8.08E+00 3.67E+00 5.31E+00 4.41E+00 3.04E+00 3.98E+00

Griewank 2.26E+02 8.84E+00 1.78E+02 1.58E+01 3.57E+01 1.94E+01 9.57E+00 1.85E+01

Rastrigin 2.19E+03 1.24E+03 5.67E+02 2.29E+02 6.52E+02 4.89E+02 2.22E+02 3.13E+02

Rosenbrock 1.46E+09 7.65E+06 5.84E+08 4.18E+06 4.04E+07 7.92E+06 2.51E+06 1.27E+07

Schwefel 4.82E+01 3.61E+01 3.82E+01 3.21E+01 4.54E+01 3.78E+01 3.18E+01 3.67E+01

Sphere 2.65E+04 9.14E+02 2.02E+04 1.68E+03 4.08E+03 2.22E+03 9.56E+02 2.11E+03

4.80E+00 4.63E+00 4.31E+00

3.17E+01 2.89E+01 2.82E+01

5.28E+02 5.21E+02 5.01E+02

1.78E+07 1.84E+07 1.74E+07

3.76E+01 3.80E+01 3.77E+01

3.52E+03 3.33E+03 3.21E+03

detect this difference of performance, they still waste some valuable fitness evaluations with the less performant techniques, especially at the early stages of the search process. Figures 8.2, and 8.3 present a comparative view of the participation adjustment for two of the six proposed functions. These problems have been selected as representative of the remaining functions. Regarding the problems for which plots have not been provided, they exhibit similar behaviors to those observed in the Sphere function (in the case of Ackley, Griewank and Rosenbrock’s functions) and to those observed in Rastrigin’s function (in the case of Schwefel’s problem). Figure 8.2 presents the participation adjustment of the four hybrid approaches in the Sphere function. For each of the RL policies, two snapshots of the learning procedure are provided: after 61 executions and after 111 executions. The first snapshot coincides with the end of the adjustment phase of the TERSQ policy, whereas the second one is taken at the end of the last execution. There are similarities and differences among the different learning policies. The first one is how differently PHC carries out participation transitions compared to the other three hybrid approaches. This policy, which obtained the best results in this problem among the hybrid approaches, systematically swaps between BCUM and UCUM techniques with participation ratios that are close to 1 for one of the techniques in most of the cases. On the other hand, the other two RL policies are more conservative in this respect, especially the TERSQ policy that, in the last executions, almost carries out a uniform distribution of the participation. From the results reported in Table 8.4 it seems that, the more conservative the algorithm, the worse average error it obtains. Figure 8.3 depicts the participation adjustment for the four hybrid algorithms in Rastrigin’s function. In this problem, the behavior of the PHC policy is different to the one it had in the previous function. In this case, the systematic exchange of algorithms lasts only until generation 10, 000 − 15, 000. From this point on, the UCUM technique executes most of the time, with occasional collaboration of the UCGM technique. WoLF presents a similar behavior but, as we have seen in the Sphere function, with a more conservative strategy. The fluctuations are, in this case, more remarkable than in the previous problem but not as much as with the PHC policy. The other two hybrid approaches present a more conservative behavior, as in the previous problem. It A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


8.5. RESULTS AND DISCUSSION

151

is important to note that these two algorithms assign more participation to UCGM than to UCUM unlike PHC and WoLF policies. This seems to be directly correlated with the average error reported by each algorithm, as the performance of PHC and WoLF in this problem is comparable, whereas the TERSQ and MOS Dynamic approaches obtain similar average errors. In Table 8.5, the results of both ranking analyses are presented. The first column, Average Ranking, orders the algorithms based on the average error in the six proposed functions. The second column, Wins, reports the number of wins for each algorithm in the pair-wise statistical comparison carried out. Both rankings provide more or less the same information, except for a small difference in the ranking of the WoLF and BCUM algorithms (WoLF obtains a worse ranking according to its average error but a better one according to its statistical results). These results confirm that there is statistical evidence to establish that the PHC policy to learn the hybridization strategy obtains the best results in this benchmark. However, as the WoLF and TERSQ strategies, as well as the MOS Dynamic approach, obtained worse results, in general, than the UCUM and BCUM individual techniques, a second experimental phase was conducted, considering only these two techniques for the hybridization. Table 8.5: Ranking and statistical test results for both the single and the hybrid algorithms for the first experimental configuration.

PHC UCUM BCUM WoLF MOS Dyn TERSQ MOS Const UCGM BCGM

Average Ranking 1.33 2.50 3.00 4.00 5.00 5.50 7.17 7.50 9.00

Wins 7 5 1 2 1 0 -4 -4 -8

Table 8.6 shows the results obtained when only BCUM and UCUM are taken into account by the hybrid approaches. The first thing that can be observed from these results is that, with this configuration of techniques, there is always an RL algorithm that obtains better results than the individual algorithms. The second remark is that, in this second experiment, the differences between the PHC policy and the other two hybrid algorithms have been considerably reduced. Furthermore, now the TERSQ algorithm obtains the lowest average error in as many functions as PHC does. WoLF policy continues to obtain worse results than the other two RL policies. However, the gap among them is smaller now. In Table 8.7, the results of both the average error ranking and the statistical ranking analysis are presented. This analysis confirms the impression derived from the results collected in Table 8.6. With this configuration, it is the TERSQ policy and not the PHC policy which obtains the best ranking, considering both the average error and the statistical tests. Moreover, the four dynamic hybrid algorithms (the three RL policies and the hybrid algorithm with dynamic adjustment of participation) obtain better rankings than any of the individual Antonio LaTorre de la Fuente


152


techniques. Only the hybrid algorithm with constant participation ratios obtains worse results than UCUM and BCUM single techniques. Table 8.6: Average error in the six proposed functions when only BCUM and UCUM techniques are considered.

BCUM UCUM MOS Const MOS Dyn PHC WoLF TERSQ Tentative Adjustment Optimal

Ackley 3.17E+00 3.67E+00 3.15E+00 3.14E+00 3.01E+00 2.95E+00

Griewank 8.84E+00 1.58E+01 9.55E+00 9.47E+00 9.10E+00 8.47E+00

Rastrigin 1.24E+03 2.29E+02 3.87E+02 2.42E+02 2.07E+02 2.63E+02

Rosenbrock 7.65E+06 4.18E+06 3.95E+06 3.75E+06 2.38E+06 2.42E+06

Schwefel 3.61E+01 3.21E+01 3.66E+01 3.50E+01 3.12E+01 3.44E+01

Sphere 9.14E+02 1.68E+03 1.01E+03 1.01E+03 9.43E+02 8.85E+02

2.93E+00 2.94E+00 2.93E+00

8.37E+00 8.55E+00 8.48E+00

2.82E+02 2.73E+02 2.39E+02

2.35E+06 2.40E+06 2.38E+06

3.43E+01 3.43E+01 3.41E+01

8.76E+02 8.76E+02 8.79E+02

Table 8.7: Ranking and statistical test results for both the single and the hybrid algorithms for the second experimental configuration.

TERSQ PHC WoLF MOS Dyn UCUM BCUM MOS Const UCGM BCGM

Average Ranking 1.92 2.42 2.83 4.58 5.17 5.50 5.75 7.83 9.00

Wins 6 5 3 1 1 -1 -2 -5 -8

Figure 8.4 depicts the evolution of the participation of each of the techniques using the four dynamic approaches. In this case, the lowest average error was obtained by the TERSQ policy, followed by the WoLF policy. These two RL algorithms have in common that their adjustment of the participation of the two available techniques is not very significant, practically static from the generation 3, 000 − 4, 000. If we compare the results of these two policies in this function with that of the hybrid approach with a constant participation ratio we can see that, even though the participation is mostly static from generation 3, 000 − 4, 000 on, the results obtained by these two algorithms are up to 15% better. This means that the hybrid strategy learnt for these generations, where the UCUM initially receives a slightly higher participation ratio, is responsible for improving the final results compared to a completely constant participation assignment. In Figure 8.5 the evolution of the participation is presented for Rastrigin’s function. In this problem, the behavior of every algorithm is completely different to that exhibited in the previous function. In this problem all the algorithms, to a greater or lesser extent, carry out a much more abrupt participation adjustment. Again, the algorithms with a more conservative behavior obtain worse results. Comparing the participation plots of the Sphere function (Figures 8.2 and 8.4) it can be observed that, A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


8.6. CONCLUSIONS

153

apart from the absence of the BCGM and UCGM techniques In the second plot, the behavior of the four hybrid algorithms is quite similar with four and two techniques. This does not happen in the case of Rastrigin’s function (Figures 8.3 and 8.5). In this function, the configurations with four techniques boosted the participation of the techniques UCUM and UCGM. For the second experiment, the UCGM technique was not used (we only considered the two techniques with best average individual performance). Despite this, the results obtained by the hybrid approaches with only two techniques are statistically better (p − value < 0.05) than those obtained with four techniques, which suggests again the idea that the real contribution of the other techniques to that leading the search process takes place at the very beginning of the execution or, eventually, with brief phases in which these techniques take over for a small period of time to create individuals that increase the diversity of the population. For this purpose, either UCGM or BCUM seem to add enough diversity to the population to let the leading technique converge to better final solutions.

8.6

Conclusions

The experimental results show statistical evidence that the regulation of the participation with RL techniques can boost the performance of the MOSRL algorithm compared to individual Genetic Algorithms and the standard version of the MOS algorithm. The three proposed policies have been able to learn, to different degrees, the most efficient strategy for the combination of the available reproductive techniques. However, some differences are exhibited by these three policies. PHC seems to be more drastic in its decisions, whereas WoLF and TERSQ show a more conservative behavior in this sense. If we observe how the probability of selecting an action within the PHC policy is updated (step 5 of Algorithm 8), we can see that, after a few executions, the probability values for the best action and the remaining actions quickly converge to one and zero, respectively. This can explain the drastic behavior of the PHC policy and thus a further research into the selection of the learning ratios should be considered. Taking these considerations into account, PHC seems to be more appropriate in contexts in which more techniques are available and a quick detection of the best technique(s) is crucial for the performance of the algorithm. On the other hand, the other two policies seem to be more suitable when the performance of the available techniques is similar and small differences should be taken into account. It would be interesting to extend this research to a larger number of functions of different dimensionality, as well as carrying out tests with larger and more diverse sets of techniques (including other evolutionary approaches such as Estimation of Distribution Algorithms, Differential Evolution, etc.). Finally, even though the performance of MOSRL has been improved by using RL policies, it could be argued that a performance comparison between traditional EAs and MOSRL is not fair, as the latter takes advantage of the results obtained in previous executions in order to learn the best strategy to obtain the final result. However, the interest of this research is not only to state whether a hybrid evolutionary algorithm guided by RL techniques can obtain better results than classic algorithms but also if they are able to learn the Antonio LaTorre de la Fuente


154


best hybrid strategy for a given problem. This could be of useful application to real-world problems that have to be solved hundreds of times with slightly different input data, such as planning or scheduling problems.



8.6. CONCLUSIONS

155


Participation

Participation

Participation Adjustment BCUM 1 UCUM UCGM BCGM 0.75 0.5 0.25

BCUM 1 UCUM UCGM BCGM 0.75 0.5 0.25

0

0 0

5K 10K 15K 20K 25K 30K 35K 40K 45K 50K Generation

0

(a) PHC after 61 executions (rank = 2)


(b) PHC after 111 executions (rank = 2) Participation Adjustment

Participation

Participation



0

0 0


0

(c) WoLF after 61 executions (rank = 4)


(d) WoLF after 111 executions (rank = 4) Participation Adjustment

Participation

Participation



0

0 0


(e) TERSQ after 61 executions (rank = 6)

0


(f) TERSQ after 111 executions (rank = 6)

Participation

Participation Adjustment BCUM 1 UCUM UCGM BCGM 0.75 0.5 0.25 0 5K 10K 15K 20K 25K 30K 35K 40K 45K 50K Generation

(g) MOS Dynamic after 111 executions (rank = 5)

Figure 8.2: Participation adjustment of the hybrid algorithms in the Sphere function



156



Participation

Participation



0

0 0


0



(b) PHC after 111 executions (rank = 1) Participation Adjustment

Participation

Participation



0

0 0


0



(d) WoLF after 111 executions (rank = 3) Participation Adjustment

Participation

Participation



0

0 0



0


(f) TERSQ after 111 executions (rank = 5)

Participation

Participation Adjustment BCUM 1 UCUM UCGM BCGM 0.75 0.5 0.25 0 5K 10K 15K 20K 25K 30K 35K 40K 45K 50K Generation


Figure 8.3: Participation adjustment of the hybrid algorithms in Rastrigin’s function



8.6. CONCLUSIONS

157


Participation Adjustment BCUM 1 UCUM

0.75

Participation

Participation

BCUM 1 UCUM

0.5 0.25

0.75 0.5 0.25

0

0 0


0



(b) PHC after 111 executions (rank = 4)



0.75

Participation

Participation

BCUM 1 UCUM

0.5 0.25

0.75 0.5 0.25

0

0 0


0



(d) WoLF after 111 executions (rank = 2)



0.75

Participation

Participation

BCUM 1 UCUM

0.5 0.25

0.75 0.5 0.25

0

0 0



0


(f) TERSQ after 111 executions (rank = 1) Participation Adjustment

Participation

BCUM 1 UCUM 0.75 0.5 0.25 0 5K 10K 15K 20K 25K 30K 35K 40K 45K 50K Generation


Figure 8.4: Participation adjustment of the hybrid algorithms in the Sphere function



158




0.75

Participation

Participation

BCUM 1 UCUM

0.5 0.25

0.75 0.5 0.25

0

0 0


0



(b) PHC after 111 executions (rank = 1)



0.75

Participation

Participation

BCUM 1 UCUM

0.5 0.25

0.75 0.5 0.25

0

0 0


0



(d) WoLF after 111 executions (rank = 5)



0.75

Participation

Participation

BCUM 1 UCUM

0.5 0.25

0.75 0.5 0.25

0

0 0



0


(f) TERSQ after 111 executions (rank = 3) Participation Adjustment

Participation

BCUM 1 UCUM 0.75 0.5 0.25 0 5K 10K 15K 20K 25K 30K 35K 40K 45K 50K Generation


Figure 8.5: Participation adjustment of the hybrid algorithms in Rastrigin’s function



Part IV

CONCLUSIONS AND FUTURE WORK

Chapter 9

Conclusions

The work presented in this PhD Thesis has fulfilled the objectives defined at the beginning of the research to a very satisfactory level. In this chapter, the most relevant conclusions and achievements of this work will be reviewed.

9.1

General Methodology for the Combination of Evolutionary Algorithms

A general methodology for the hybridization of EAs has been proposed and validated on a significantly large testbed. This methodology allows the hybridization of several EAs and permits an EC practitioner to design a hybrid algorithm without any knowledge nor assumption of the performance of a particular EA model for a given problem. Furthermore, the results will, in most of the cases, be better than those of a single technique on itself and, in the worse case, very close to those of the best individual algorithm. The reproductive mechanisms of the different evolutionary approaches are abstracted from the algorithm they belong to. This means that all the logic of the evolutionary process is managed by MOS and that the addition of a new technique only requires the implementation of these mechanisms according to the interface defined by MOS. Moreover, MOS can not only combine several EAs working with a shared population of candidate solutions but also adjust how many of these individuals will be generated by each algorithm. This adjustment can be a predefined assignment based on previous knowledge on the problem that is being solved and the algorithms that are being used but also, and more interesting, a dynamic adjustment according to a quality measure which deAntonio LaTorre de la Fuente


162

CHAPTER 9. CONCLUSIONS

termines how good the solutions produced by each evolutionary approach are. The concept of quality depends on each particular case but, in general, it is desirable that the reproductive techniques create new solutions with the highest possible fitness value. However, in some contexts it would be necessary to favor those techniques generating more diverse solutions to avoid premature convergence to a local optima on very rugged fitness landscapes. MOS allows the designer of the algorithm to select any of these measures and even combine them for better adaptation to the particularities of the problem.

9.2

Application to Complex Optimization Problems

The methodology described in this work has been applied to solve several complex optimization problems. These problems can be classified into two groups: combinatorial and continuous problems. From the first group, two problems have been selected. The first problem in the combinatorial group was the Supercomputer Scheduling problem. In this problem, the makespan of a cluster-like Supercomputer must be minimized. For this experimentation, a Hybrid Evolutionary Algorithm composed of two Genetic Algorithms was compared against several classic scheduling policies and the standalone GAs. The results in this problem were statistically significant according to the Wilcoxon signed-rank test that was carried out [LPRdM08]. The second one is a classic problem in the literature: the Traveling Salesman Problem. For this problem, several hybrid configurations were designed from five Genetic Algorithms with different recombination operators and encodings for the solutions (both an integer and a real encoding were used). This experimentation reported a superior performance, with statistical significance, of the hybrid algorithms in comparison with the individual GAs [LPRM08]. Furthermore, the study also concluded that, the higher the number of techniques, the better the performance of the hybrid algorithm. From the second group of problems, two well-known state-of-the-art benchmarks were considered. Both benchmarks were proposed for special sessions on continuous optimization held at IEEE CEC 2005 and IEEE CEC 2008, respectively. The first benchmark is composed of 25 hard optimization functions. A set of seven hybrid configurations with different strategies for the adjustment of the participation and different quality measures was considered. The results showed that most of the hybrid approaches were significantly better than the individual algorithms [LPFM09]. Furthermore, a comparison of the hybrid algorithms with those proposed at the original session reported very competitive results of MOS, which was only beaten by the G-CMA-ES algorithm, although with no significant differences. The second benchmark is made up of 6 functions whose difficulty actually comes from the large considered dimensionality (1, 000 dimensions). In these functions, the highest number of techniques of this work was used. A total of eight techniques were used, combining several GAs, DEs and ESs. The same seven hybrid configurations were tested in this benchmark. However, the results in this benchmark could not be properly statistically validated, as the size of the dataset was not large enough. However, the results of the statistical A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


9.3. CENTRAL VS. SELF-ADAPTIVE APPROACH

163

tests, despite they should be taken into account with care, reported an outstanding performance of the hybrid configurations compared to the individual algorithms [LPMZ09]. The comparative analysis carried out with the algorithms proposed for the original session reported that MOS occupied the 4th position out of 10 algorithms and that the differences compared to the best algorithm in this session were not statistically significant.

9.3

Central vs. Self-Adaptive Approach

From the experimentation conducted for this work it can be concluded that the Central approach performs better than the Self-Adaptive approach in most of the cases. This is important, as much of the work reviewed in Chapter 3 uses this or similar approaches. The reason for this lower performance is the quick diffusion of the participation information through the population of individuals that this approach presents. This characteristic can be beneficial if one of the techniques is actually much better than the others. In this case, few evaluations are wasted on solutions generated by techniques with a low performance. However, this is not always the case and it is here where this approach fails. In many cases, the performance of the techniques changes through the search process. If the strategy used to adjust the participation of each technique is not able to detect these variations, the hybrid algorithm will continue using the same techniques it selected from the first information it had. Furthermore, in some cases it could be interesting to favor those techniques generating individuals with some specific characteristic, such as diversity, for example. This can be easily modelled by the Central approach by means of a Quality Function, but there is not an easy way to include this in the Self-Adaptive approach. This flexibility along with its ability to better adjust the participation of the techniques in different phases of the search make the Central approach a much powerful tool for the combination of Evolutionary Algorithms than the Self-Adaptive approach.

9.4

Learning Optimal Hybrid Strategies

Reinforcement Learning mechanisms have been used to study their ability to identify good hybridization strategies through several runs of a Hybrid Evolutionary Algorithm. In particular, three Q-learning policies have been tested with satisfactory results. The CEC 2008 benchmark was considered for the validation of the proposal. The experimental results proved that the hybrid algorithms with RL mechanisms are able to learn hybridization patterns that outperform the results obtained by the individual algorithms and also by the hybrid algorithms with dynamic adjustment of participation previously introduced [PLPO09, LPMF10]. From the experimental results, it can also be seen that some of the policies (TERSQ and WoLF) are more conservative than the other one (PHC) and, for this reason, they are more suitable for scenarios where the techniques that compose the hybrid algorithm have a similar performance. On the other hand, when the quality of the considered techniques is quite uneven, the PHC policy seems to manage this situation more efficiently, as it is able to discard the worst techniques more quickly (with less learning) than the others. Antonio LaTorre de la Fuente


164


9.5

Final Remarks

To summarize, the following main conclusions can be extracted from this study: • Hybrid algorithms behave, in general, better than individual algorithms. In some cases, one hybrid strategy results to be better than any of the others. In other cases, different hybrid strategies are better on different problems. Nevertheless, in most cases there is always a hybrid strategy that outperforms any of the individual algorithms. The problem of selecting the best algorithm has been translated into selecting the best strategy for hybridization. However, a better performance is obtained in compensation. • Dynamic adjustment strategies are, in most cases, better than constant assignments of participation. • The Self-Adaptive approach implemented for comparison purposes often suffers from premature selection of a subset of techniques that, in many cases, is not the best one in a long-term point of view. This means that, in the early stages of the search process, the hybrid algorithm can select a subset of techniques (increasing their participation) and spread this information through all the individuals in the population. However, in a later stage other techniques could be able to create better solutions, but it would be very difficult for them to create new individuals, as their participation has been dramatically decreased at the beginning of the search, and this information is encoded in most of the individuals in the population. • Reinforcement Learning mechanisms are capable of learning good hybrid strategies. This is very interesting in contexts in which the same algorithm is going to be executed many times, with differences in the input data that do not change the profile of the problem. • The implementation of the MOS framework has been carried out within the GAEDALib library [Día05], which is based on GALib [Wal96], a library originally developed by Matthew Wall at the Massachusetts Institute of Technology (MIT). The GAEDALib library provides implementations for different Genetic Algorithms (Simple, Steady State, etc.), Estimation of Distribution Algorithms, Differential Evolution, Particle Swarm Optimization, Evolution Strategies and all the MOS framework, as well as a MPI interface to allow the deployment of parallel algorithms. This library allows an easy implementation of new algorithms, application problems, initialization strategies, etc., thanks to its modular and extensible design. Finally, as it has been implemented in C++, it has an outstanding computational performance compared to other EAs implementations made on other programming languages, such as Java.

9.6

Selected Publications

To conclude this chapter, the most relevant publications derived from this research will be referenced here. The reader should note that this is not an exhaustive list but a summary of the most closely related publications to this PhD Thesis. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


9.6. SELECTED PUBLICATIONS

165

JCR Journals: • A. LaTorre, J.M. Peña, S. Muelas, and A.A. Freitas. Learning hybridization strategies in evolutionary algorithms. Intelligent Data Analysis, 14(3), 2010. Book Chapters: • A. LaTorre, J.M. Peña, V. Robles, and P. de Miguel. Supercomputer scheduling with combined evolutionary techniques. In F. Xhafa and A. Abraham, editors, Meta-heuristics for Scheduling: Distributed Computing Environments, volume 146 of Studies in Computational Intelligence, pages 95–120. Springer Verlag, Germany, 2008. International Conferences: • A. LaTorre, J.M. Peña, S. González, V. Robles, and F. Famili. Breast cancer biomarker selection using Multiple Offspring Sampling. In Proceedings of the ECML/PKDD 2007 Workshop on Data Mining in Functional Genomics and Proteomics: Current Trends and Future Directions, Warsaw, Poland, September 2007. Springer Verlag. • A. LaTorre, J.M. Peña, V. Robles, and S. Muelas. Using Multiple Offspring Sampling to guide genetic algorithms to solve permutation problems. In M. Keijzer, editor, Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation, GECCO 2008, pages 1119–1120, New York, NY, USA, July 2008. ACM Press. • A. LaTorre, F. Clautiaux, E-G. Talbi, and J.M. Peña. VRP-extended: When confidence and fleet size are also important. In E-G. Talbi, editor, Proceedings of the 2nd International Conference on Metaheuristics and Nature Inspired Computing, META 2008, 2008. • A. LaTorre, J.M. Peña, J. Fernández, and S. Muelas. MOS como herramienta para la hibridación de algoritmos evolutivos.

In Proceedings del VI Congreso Español sobre Metaheurísticas, Algoritmos

Evolutivos y Bioinspirados, MAEB 2009, pages 457–464, 2009. • A. LaTorre, J.M. Peña, S. Muelas, and C. Pascual. Quality measures to adapt the participation in MOS. In Proceedings of the 11th IEEE Congress on Evolutionary Computation, CEC 2009, pages 888–895. IEEE Press, 2009. • L. Peña, A. LaTorre, J.M. Peña, and S. Ossowski. ing algorithms for stochastic rewards.

Tentative exploration on reinforcement learn-

In E. Corchado, editor, Proceedings of the 4th International

Conference on Hybrid Artificial Intelligent Systems, HAIS 2009, volume 5572 of Lecture Notes in Artificial Intelligence, pages 336–343, Berlin, June 2009. Springer-Verlag GmbH. Antonio LaTorre de la Fuente


166


• A. LaTorre, J.M. Peña, S. Muelas, and M. Zaforas. continuous problems.

Hybrid evolutionary algorithms for large scale

In Proceedings of the 11th Annual Conference on Genetic and Evolutionary

Computation, GECCO 2009, pages 1863–1865. ACM Press, 2009.



Chapter 10

Future Work

The work presented in this PhD Thesis is a thorough study into the field of Hybrid Evolutionary Algorithms. Several strategies for the combination of different evolutionary approaches have been proposed and tested. However, a series of further improvements and new experimentation and analysis could be considered for the future. In this chapter, some of the most relevant future lines and open issues at the end of this work will be enumerated. Some of them deal with the addition of new techniques, whereas others are related to the inclusion of new structural variations in the MOS framework. The application of the proposed methodology to new problems is always a good idea, as well as the implementation of new quality and analysis measures based on the phylogenetic information that can be stored by the algorithm. Finally, regarding the inclusion of new evolutionary techniques, it should be remarked that, for this work, simple versions of the evolutionary algorithms have been considered to ease the analysis of the contribution of the hybridization mechanisms proposed in this PhD Thesis. Nevertheless, even with these simple configurations, the experimental results have been quite satisfactory and competitive with the state-of-the-art approaches, which means that more sophisticated techniques will increase even more the already good performance of the hybrid algorithms built with the MOS framework.

10.1

Variable Sets of Techniques

The experimentation conducted in Chapters 5 and 6 has considered sets of up to eight techniques which are applied simultaneously. It could be interesting to increase the number of techniques being applied at the Antonio LaTorre de la Fuente


168

CHAPTER 10. FUTURE WORK

same time. However, a maximum number should also be carefully established, as a minimum ratio between the number of techniques and the overall population size should also be guaranteed. For this reason, an intelligent mechanism for selecting and exchanging the currently used techniques among a wider set of techniques would be of great value to be able to test multiple different combinations of techniques in the same run. This could be done by means of trial and error metaheuristics or by deriving decision rules of the type: “IF quality of DE is poor THEN introduce a more exploratory GA technique”.

10.2

Combination with Non-Evolutionary Techniques

At this moment, the MOS framework is limited to the combination of Evolutionary Algorithms. Other non-evolutionary techniques such as, for example, the Local Search Algorithm described in Section 5.3.2.3, can still be used by embedding them into the main algorithm or into the techniques offered by MOS. However, the same level of abstraction achieved for the evolutionary approaches, for which the sampling methods have been extracted and encapsulated within the concept of technique, would be desirable. This would allow the LS mechanisms to take advantage from the dynamic mechanisms offered by MOS, as they would be treated as another offspring technique and their participation would be adapted in the same way. As a result of this, if the LS mechanism is only necessary during certain part of the search process, the Dynamic Participation Functions proposed by MOS could detect such a situation and adapt the participation of the LS accordingly. Furthermore, the MOS framework should allow the use of these new techniques as Initialization or Local Search mechanisms, but also as autonomous techniques that could be combined with other methods.

10.3

Implementation of Restart Mechanisms

Some of the benchmarks used in this work did not have a limited number of Fitness Evaluations. In other cases, this limit was high enough so that the algorithm converged before the maximum number of FEs was reached. In these cases, a restart mechanism for both the population and the hybridization mechanisms (participation ratios, for example) would be interesting. The diversification of the current population by means of such a restart mechanism would make it easier for the algorithm to escape from local optima. On the other hand, a restart of the hybridization control information (participation ratios, quality values, etc.) could ease the use of the most suitable set of techniques in different phases of the search as the participation of a technique after the restart will not be burdened with its previous performance. Furthermore, even the set of techniques in use could be reset to allow the participation of other techniques and thus increase the diversity of the offspring mechanisms available for this new stage. This feature should be integrated in the MOS framework as a dynamic mechanism controlling some parameters and carrying out the appropriate actions to avoid falling into unwanted situations. For example, the diversity of the population could be periodically evaluated, and the population should be reinitialized if this measure gets under an acceptable lower bound. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


10.4. POST-EXECUTION ANALYSIS TECHNIQUES

10.4

169

Post-Execution Analysis Techniques

The future lines presented so far are intended to provide the MOS framework with more and more sophisticated tools for the combination of evolutionary (and non-evolutionary, as proposed in Section 10.2) algorithms. In this section, a post-execution analysis technique of the behavior of the hybrid algorithm is proposed. For this purpose, the phylogenetic information stored through the execution of the EA could be used. Within the MOS framework, it is possible to store information of the connections between parent and child individuals and the techniques used to create each of them. The idea is to use this information to study whether the best individuals found by the algorithm come from a pure lineage of elite solutions created by a particular technique or, on the other hand, they are the product of small variations introduced by the different available techniques. One possibility would be to compute the entropy of the solutions according to the reproductive techniques used to create them and their ancestors. This measure would provide a numeric value of the degree of participation of the different techniques in the evolution of the best solutions. A small entropy would mean that only small collaboration is needed by one technique to evolve the best individual, whereas a high entropy would mean that all the techniques contribute, to a lesser or greater extent, in finding the best overall solution.

10.5

New Quality Measures and Participation Functions

The use of other Quality Measures than those considered for this work could be worth testing. Some of them, such as Diversity (Section 4.2.1.2.2) or Algorithmic Difficulty measures different than the NSC (Section 4.2.1.2.3), have been suggested in Chapter 4. Additionally, the philogenetical information referred in Section 10.4 could also be used to design new Quality Measures taking into account not only the current performance of the techniques, but also their historical behavior. Regarding the Participation Functions, new functions could be proposed. One of the most interesting possibilities is a hybrid dynamic PF capable of considering simultaneously several Quality Measures. This hybrid approach could use different quality criteria in different phases of the search process or combine these measures through all the optimization, dynamically adjusting the weights of each of the measures trying to enforce different characteristics of the offspring populations in different phases of the search but always guaranteeing a certain level of the other characteristics. For example, diversity could be emphasized at the beginning of the process to cover a larger area of the solutions space. However, best solutions so far should also be preserved in some degree to guarantee that the search is guided to promising areas of the solutions space.

10.6

Other Ideas

• Analysis of the behavior of the MOS framework with extreme configurations (many techniques, small populations, etc.) Antonio LaTorre de la Fuente


170

CHAPTER 10. FUTURE WORK

• Application to real-world problems in the field of genomics and proteomics, where the research group has previous experience, or the field of neuroscience, where the research group is getting involved as a part of the Cajal Blue Brain Project1 . • Multiobjective implementation of the MOS framework (optimizing several Quality Measures simultaneously) as well as application of the algorithm to multiobjective problems. • Application to neutral problems, in which the diversity of the population is a key aspect. In this sense, it would be interesting to study how multiple offspring mechanisms can ease the search in these fitness landscapes. • Application in robust optimization, with noisy functions in which the objective is not to find the best isolated point maximizing the fitness function but rather finding a homogeneous area of good solutions. The MOS framework could be used to find the technique(s) that best suit to this particular characteristic by using a quality function that favors those techniques creating the most homogeneous and fittest set of individuals. • Application to dynamic problems, in which the input data or the conditions of the problem can change through the execution of the algorithm. In this kind of problems, an efficient and intelligent adjustment of the participation of the techniques could probably improve the performance of the algorithm. • Other learning models could be considered for the case of the MOSRL algorithm, such as, for example, rule induction algorithms to extract a set of rules which would determine how the participation of the techniques should be adjusted and which characteristics would be considered for this adjustment. Regarding also the MOSRL algorithm, it would be interesting to study how the algorithm is able to learn the underlying structure of the problem when the input data is not exactly the same. Some examples of this type of problems are classic routing problems, such as the TSP or the Vehicle Routing Problem (VRP), to name a few. • Analysis of the influence of the parallelization of the algorithm by means of an islands model. Both homogeneous and heterogeneous scenarios (with different sets of techniques on different islands) should be studied and compared. Additionally, extra information could be exchanged among the islands as, for example, participation or quality information needed by the Dynamic Participation Functions, to provide the algorithm with a second layer of global knowledge of the behavior of the algorithm. • Public distribution of the source code, to allow other researchers to use and maybe improve the MOS framework. • Improve the performance of some of the techniques present in the GAEDALib such as, for example, the EDAs [PMMAM09]. 1 http://cajalbbp.cesvima.upm.es/



Part V

APPENDICES

Appendix A

Experimental and Validation Procedures

In this chapter, the experimental and validation procedures followed in this work will be presented. It is important to note that these are general guidelines that have been adapted to the particularities of each problem. In some cases, the results of the validation phase have to be interpreted with care, as the number of available instances was quite limited for the statistical tests to report conclusive results. In other cases, specific validation procedures have been followed in order to check if a proposed hypothesis is true, as in the case of the Traveling Salesman Problem, for which a study of the benefits of combining more or less algorithms was carried out. Section A.1 describes the general experimental procedure, whereas sections A.2 and A.3 review the nWins and the Holm procedures used for the statistical validation, respectively.

A.1

General Procedure

In this section, the general experimental procedure used in this work will be introduced. In general, for each experiment described in Chapters 5 and 6, several instances or problems have been considered. Each of these problems will be solved by several algorithms or configurations of the algorithms. In some cases, the hybrid algorithms built with MOS are compared against the individual algorithms it is made up of. In other cases, MOS is compared against other types of algorithms (other heuristics or exact methods, for example). For any of these cases, the same general procedure applies: • All the proposed algorithms or configurations are considered. Antonio LaTorre de la Fuente


174

APPENDIX A. EXPERIMENTAL AND VALIDATION PROCEDURES

• For each problem and configuration/algorithm, several independent runs are executed. The exact number of executions depends on the problem, as in some cases this number is fixed by the experimental procedure designed for the original session the problems were proposed for. However, a minimum number of twenty executions is always granted. • The average fitness for each algorithm/configuration is reported. • The nWins procedure (Section A.2) is executed. This procedure carries out an analysis of the behavior of the algorithms based on the Wilcoxon rank-sum test. It can be executed in a per problem or global way, depending on the available number of instances/problems. • The Holm procedure (Section A.3) is executed if the available number of instances is enough for this test to report significant results. As said before, this general experimental procedure should be adapted to the particular characteristics of the considered problem.

A.2

nWins Procedure

The nWins procedure [MPR+ 07] is a statistical validation procedure intended to compare multiple algorithms. It can be used for comparisons over both single and multiple problems. This procedure carries out pairwise comparisons by means of the Wilcoxon signed-rank test among all the algorithms/configurations. One algorithm is said to be the winning algorithm in such a comparison if the Wilcoxon test reports a p − value lower than a pre-established threshold (normally 0.05), whereas the other algorithm receives the name of losing algorithm. Through all these comparisons, the winning algorithm is granted “+1 wins”, whereas the losing one obtains “-1 wins”. At the end of the nWins procedure the algorithms are ranked according to their number of wins. More precisely, the nWins procedure works as follows: 1. Every pair of algorithms is compared by means of a Wilcoxon ranked-sum test. 2. If the result of this test is lower than the considered threshold, the winning algorithm is granted “+1 wins”, whereas the losing algorithm obtains “-1 wins”. If the result is not statistically significant, neither of the algorithms gets its number of wins modified. 3. When all the comparisons have been carried out, the total number of wins is computed for each algorithm. 4. Finally, the algorithms are ranked according to their total number of wins, which offers a global view of the performance of the different algorithms.



A.3. HOLM PROCEDURE

A.3

175

Holm Procedure

The Holm procedure [Hol79] is a statistical validation method to compare multiple algorithms over multiple problems or instances. This procedure sequentially checks the hypotheses according to their significance (the p−values p1 , p2 , . . . , pk−1 are ordered in such a way that p1 ≤ p2 ≤ . . . ≤ pk−1 ). The procedure compares the i − th hypothesis with

α k−i ,

beginning with the hypothesis with the most significant p − value. A hypothesis is

rejected if its associated p − value is below

α k−i ,

which allows the procedure to check the following hypothesis.

If one hypothesis can not be rejected, then the remaining hypothesis remain supported. This procedure takes into account the Family-Wise Error, i.e., “ the probability of making one or more false discoveries among all the hypotheses when performing multiple pairwise tests” [GMLH08], by applying a correction to the raw p-values. More specifically, the Holm procedure works as follows: 1. All the algorithms are ranked on each problem according to their fitness value or error rate, depending on which measure is being used for the optimization. 2. The average ranking value is computed for each algorithm. 3. Algorithms are sorted according to their average ranking value. The algorithm with best average ranking will be considered as the reference algorithm from now on. 4. For each algorithm different from the reference algorithm, the following statistic is computed:

zi =

(Ri − Rref ) q , ∀i 6= ref k(k+1) 6N

Where k is the number of algorithms being compared, and N is the number of available instances or problems. 5. These zi values are used for sampling the associated p − value from the table of the normal distribution:

pi = 2(1 − N orm(zi )) 6. The raw p − values are adjusted for the Holm procedure to take into account the Family-Wise Error. 7. Each corrected p − value is compared to

α k−i ,

α being the threshold value, normally established to 0.05,

starting with the most significant one. 8. A hypothesis can be rejected if its corrected p − value is below

α k−i .

In this case, the next hypothesis

would be checked. If the p−value is not below that value, this hypothesis and the following ones remain supported, and the procedure finishes. Antonio LaTorre de la Fuente


176

APPENDIX A. EXPERIMENTAL AND VALIDATION PROCEDURES

This procedure is very sensitive to the number of instances available for the comparison. In [GMLH08], a general rule is given about the minimum number of samples of a distribution (fitness values or errors reported for an algorithm) needed to assess the efficiency of the statistical test:

N =a·k where N is the number of functions (instances), k is the number of algorithms to be compared and a ≥ 2.



Appendix B

Complete Results

B.1

Traveling Salesman Problem

This section presents the whole information about the results obtained in the experiments carried out for the TSP problem. Table B.1 presents the results obtained in the three considered datasets. For each dataset, every combination of techniques introduced in Section 5.3.2.2 is tested. The results shown in this table are the average of 20 independent executions of the algorithm. Table B.1: Results of the exhaustive experiment Techniques t0 t0t1 t0t1t2 t0t1t2t3 t0t1t2t3t4 t0t1t2t4 t0t1t3 t0t1t3t4


Swiss42 0.93 ± 0.01 0.96 ± 0.51 0.97 ± 0.01 0.96 ± 0.03 0.96 ± 0.01 0.94 ± 0.01 0.97 ± 0.01 0.97 ± 0.01

Brazil58 GR120 0.88 ± 0.00 0.73 ± 0.00 0.94 ± 0.00 0.84 ± 0.03 0.95 ± 0.00 0.84 ± 0.01 0.96 ± 0.01 0.87 ± 0.01 0.94 ± 0.01 0.88 ± 0.00 0.94 ± 0.00 0.88 ± 0.02 0.95 ± 0.01 0.84 ± 0.01 0.94 ± 0.01 0.88 ± 0.01 Continued on next page


178

APPENDIX B. COMPLETE RESULTS

Continued from previous page t0t1t4 0.95 ± 0.30 0.93 ± 0.01 t0t2 0.92 ± 0.00 0.87 ± 0.00 t0t2t3 0.98 ± 0.01 0.95 ± 0.01 t0t2t3t4 0.96 ± 0.02 0.94 ± 0.03 t0t2t4 0.94 ± 0.00 0.88 ± 0.00 t0t3 0.97 ± 0.00 0.94 ± 0.00 t0t3t4 0.96 ± 0.01 0.93 ± 0.00 t0t4 0.91 ± 0.01 0.86 ± 0.01 t1 0.92 ± 0.00 0.95 ± 0.02 t1t2 0.92 ± 0.03 0.93 ± 0.01 t1t2t3 0.96 ± 0.01 0.95 ± 0.01 t1t2t3t4 0.97 ± 0.01 0.95 ± 0.01 t1t2t4 0.93 ± 0.00 0.94 ± 0.01 t1t3 0.96 ± 0.00 0.95 ± 0.02 t1t3t4 0.95 ± 0.02 0.94 ± 0.00 t1t4 0.94 ± 0.01 0.93 ± 0.00 t2 0.58 ± 0.02 0.43 ± 0.02 t2t3 0.98 ± 0.01 0.95 ± 0.01 t2t3t4 0.96 ± 0.02 0.95 ± 0.01 t2t4 0.79 ± 0.04 0.74 ± 0.00 t3 0.96 ± 0.01 0.95 ± 0.00 t3t4 0.95 ± 0.01 0.94 ± 0.02 t4 0.79 ± 0.03 0.76 ± 0.00 Avg. fitness 0.93 0.91

B.2

0.82 ± 0.00 0.84 ± 0.01 0.83 ± 0.01 0.87 ± 0.00 0.85 ± 0.00 0.83 ± 0.00 0.83 ± 0.00 0.83 ± 0.00 0.71 ± 0.04 0.83 ± 0.00 0.84 ± 0.00 0.86 ± 0.01 0.84 ± 0.01 0.82 ± 0.00 0.84 ± 0.02 0.84 ± 0.00 0.71 ± 0.00 0.83 ± 0.00 0.85 ± 0.00 0.84 ± 0.00 0.69 ± 0.00 0.84 ± 0.01 0.71 ± 0.02 0.82

CEC 2005 Benchmark

This section presents the full experimental results obtained in the 25 functions that make up the CEC 2005 Benchmark described in Section 6.2. Table B.2 contains the results obtained in the 10 dimensional functions, whereas Table B.3 presents the results obtained in the 30 dimensional functions. The legend to interpret both tables is the following: • BCGM ≡ Genetic Algorithm configured with BLX-α Crossover and Gaussian Mutator. • UCUM ≡ Genetic Algorithm configured with Uniform Crossover and Uniform Mutator. • DE Exp ≡ Differential Evolution with Exponential Crossover. • DE Bin ≡ Differential Evolution with Binomial Crossover. • fAvg PF1 ≡ Hybrid algorithm made up of the four aforementioned techniques, making use of the Fitness Average Quality Measure and the Dynamic Participation Function with constant population size. • fAvg PF2 ≡ Hybrid algorithm making use of the Fitness Average Quality Measure and the Dynamic Participation Function with variable population size. A FRAMEWORK FOR HYBRID DYNAMIC EVOLUTIONARY ALGORITHMS: MULTIPLE OFFSPRING SAMPLING (MOS)


B.3. CEC 2008 BENCHMARK

179

• NSC PF1 ≡ Hybrid algorithm making use of the Negative Slope Coefficient Quality Measure and the Dynamic Participation Function with constant population size. • NSC PF2 ≡ Hybrid algorithm making use of the Negative Slope Coefficient Quality Measure and the Dynamic Participation Function with variable population size. • Self ≡ Hybrid algorithm with participation ratios encoded in the chromosome of the individual with arithmetic average recombination of the participation information. • W. Self ≡ Hybrid algorithm with participation ratios encoded in the chromosome of the individual with weighted average recombination of the participation information. • Constant ≡ Hybrid algorithm with constant participation ratios. Table B.2: Full results in the CEC 2005 Benchmark in 10 dimensions

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25

B.3

BCGM 1.12E-01 4.02E+00 3.23E+05 4.85E+00 2.23E+02 5.08E+01 1.23E+00 2.03E+01 4.34E-01 1.09E+01 2.81E+00 5.95E+02 1.02E+00 2.81E+00 1.23E+02 1.13E+02 1.18E+02 7.98E+02 8.19E+02 7.79E+02 7.22E+02 7.27E+02 8.81E+02 2.15E+02 1.06E+03

DE Bin 0.00E+00 3.75E+00 2.84E+06 1.09E+01 0.00E+00 6.11E+00 5.55E-01 2.04E+01 2.01E-05 3.57E+01 8.40E+00 8.48E+01 1.79E+00 3.86E+00 2.64E+02 1.74E+02 1.94E+02 3.00E+02 3.10E+02 3.35E+02 5.00E+02 7.84E+02 5.59E+02 2.88E+02 4.03E+02

DE Exp 0.00E+00 8.88E-02 1.30E+06 3.46E+00 3.34E+01 2.31E+00 1.66E-01 2.04E+01 0.00E+00 1.35E+01 5.63E+00 2.10E+02 2.63E-01 3.37E+00 8.79E-01 1.29E+02 1.48E+02 7.04E+02 7.39E+02 7.27E+02 5.04E+02 7.85E+02 5.82E+02 2.00E+02 3.93E+02

UCUM 1.72E-02 2.82E+01 8.10E+05 9.65E+01 8.88E+02 8.78E+01 1.86E+00 2.03E+01 7.65E-03 1.26E+01 5.69E+00 8.29E+02 4.04E-01 3.42E+00 1.07E+01 1.23E+02 1.29E+02 7.65E+02 7.90E+02 7.91E+02 7.98E+02 7.80E+02 8.55E+02 2.15E+02 7.47E+02

fAvg PF1 0.00E+00 7.21E-08 1.38E+05 1.62E-05 0.00E+00 2.18E+00 5.67E-02 2.04E+01 0.00E+00 6.17E+00 4.67E+00 1.03E+02 2.36E-01 3.20E+00 3.07E+01 1.02E+02 1.09E+02 5.89E+02 5.55E+02 5.91E+02 6.65E+02 7.66E+02 7.31E+02 2.00E+02 3.94E+02

fAvg PF2 5.94E-02 1.75E+02 1.61E+06 2.16E+02 3.57E+02 7.23E+02 7.34E-01 2.04E+01 4.90E-01 1.46E+01 5.46E+00 1.45E+03 6.28E-01 3.45E+00 1.03E+02 1.34E+02 1.31E+02 6.73E+02 6.53E+02 6.78E+02 6.17E+02 7.84E+02 6.35E+02 2.00E+02 6.02E+02

NSC PF1 0.00E+00 2.78E-06 1.14E+05 3.51E-07 0.00E+00 3.12E+00 4.23E-02 2.04E+01 0.00E+00 8.28E+00 2.09E+00 2.89E+02 3.19E-01 3.17E+00 3.00E+01 1.03E+02 1.15E+02 4.91E+02 4.78E+02 5.46E+02 5.75E+02 7.58E+02 7.32E+02 2.00E+02 3.94E+02

NSC PF2 0.00E+00 0.00E+00 3.25E+04 0.00E+00 0.00E+00 1.03E+01 6.87E-02 2.04E+01 0.00E+00 8.12E+00 3.28E+00 9.32E+01 2.52E-01 2.93E+00 8.47E+01 1.07E+02 1.08E+02 5.19E+02 4.26E+02 4.70E+02 5.15E+02 7.61E+02 6.83E+02 2.00E+02 3.89E+02

Self 2.81E-03 9.27E+00 3.66E+05 2.53E+01 5.21E+02 3.09E+01 6.30E-01 2.03E+01 3.38E-03 1.25E+01 4.89E+00 5.93E+02 4.10E-01 3.25E+00 1.03E+02 1.18E+02 1.21E+02 6.39E+02 6.64E+02 7.75E+02 7.18E+02 7.69E+02 7.93E+02 2.00E+02 6.97E+02

W. Self 4.08E-03 1.69E+01 5.97E+05 3.34E+01 4.94E+02 3.94E+01 8.13E-01 2.03E+01 3.61E-03 1.26E+01 5.52E+00 4.71E+02 4.35E-01 3.26E+00 4.10E+01 1.15E+02 1.22E+02 7.51E+02 6.46E+02 5.65E+02 7.25E+02 7.52E+02 9.60E+02 2.00E+02 6.96E+02

Constant 0.00E+00 5.45E-06 1.83E+05 2.30E-10 9.71E+01 9.37E+00 2.33E-01 2.04E+01 1.32E-05 1.77E+01 6.02E+00 2.61E+02 4.52E-01 3.38E+00 8.14E+01 1.24E+02 1.24E+02 6.22E+02 7.54E+02 6.79E+02 7.54E+02 7.79E+02 7.46E+02 2.00E+02 3.99E+02

CEC 2008 Benchmark

This section presents the full experimental results obtained in the 6 functions that make up the CEC 2008 Benchmark described in Section 6.3. These results can be found in Table B.4. The legend to interpret this table is the following: • BCGM ≡ Genetic Algorithm configured with BLX-α Crossover and Gaussian Mutator. • BCUM ≡ Genetic Algorithm configured with BLX-α Crossover and Uniform Mutator. • UCGM ≡ Genetic Algorithm configured with Uniform Crossover and Gaussian Mutator. • UCUM ≡ Genetic Algorithm configured with Uniform Crossover and Uniform Mutator. Antonio LaTorre de la Fuente


180


Table B.3: Full results in the CEC 2005 Benchmark in 30 dimensions

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25

BCGM 1.83E+02 3.01E+03 1.93E+07 4.69E+03 4.27E+03 3.78E+05 2.92E+01 2.10E+01 5.71E+01 1.88E+02 2.50E+01 4.95E+04 1.81E+01 1.30E+01 4.42E+02 2.37E+02 2.76E+02 9.19E+02 9.19E+02 9.24E+02 5.32E+02 9.38E+02 5.52E+02 2.56E+02 1.93E+03

DE Bin 3.83E-01 2.83E+04 2.23E+11 3.68E+04 5.09E+03 2.36E+04 1.54E+02 2.10E+01 1.01E+02 2.40E+02 3.98E+01 2.82E+05 3.69E+01 1.36E+01 2.40E+02 2.72E+02 2.89E+02 9.06E+02 9.06E+02 9.16E+02 6.86E+02 8.92E+02 9.49E+02 9.76E+02 3.31E+02

DE Exp 0.00E+00 2.74E+02 2.20E+07 8.04E+03 5.42E+03 2.35E+01 9.26E-02 2.09E+01 0.00E+00 1.38E+02 2.73E+01 1.73E+04 1.21E+00 1.28E+01 6.27E+01 2.27E+02 2.78E+02 9.12E+02 9.12E+02 9.22E+02 5.00E+02 9.66E+02 5.34E+02 2.00E+02 2.06E+02

UCUM 9.52E-01 1.48E+03 1.55E+07 3.44E+03 3.70E+03 7.58E+02 3.72E+00 2.09E+01 3.20E-01 6.01E+01 2.45E+01 2.27E+04 1.90E+00 1.27E+01 3.25E+02 1.29E+02 1.82E+02 9.15E+02 9.15E+02 9.25E+02 5.00E+02 9.21E+02 5.34E+02 2.00E+02 1.36E+03

fAvg PF1 0.00E+00 9.50E+02 7.41E+06 1.29E+04 3.40E+03 2.27E+02 4.21E-02 2.09E+01 2.52E-01 6.96E+01 2.50E+01 1.68E+04 2.51E+00 1.27E+01 3.10E+02 2.03E+02 1.42E+02 9.07E+02 9.08E+02 9.18E+02 5.15E+02 9.15E+02 5.54E+02 2.00E+02 2.26E+02

fAvg PF2 8.93E+02 3.65E+04 1.21E+11 3.97E+04 1.08E+04 3.67E+11 3.52E+02 2.11E+01 5.92E+01 2.05E+02 3.23E+01 1.70E+05 2.32E+03 1.33E+01 4.23E+02 2.65E+02 3.11E+02 9.25E+02 9.25E+02 9.36E+02 5.36E+02 1.05E+03 5.52E+02 7.50E+02 1.28E+03

NSC PF1 0.00E+00 1.08E+01 3.04E+06 1.56E+03 1.87E+03 4.73E+01 1.61E-02 2.09E+01 0.00E+00 6.38E+01 1.71E+01 4.95E+03 1.20E+00 1.26E+01 3.35E+02 1.62E+02 9.79E+01 9.06E+02 9.06E+02 9.16E+02 5.00E+02 8.66E+02 5.34E+02 2.00E+02 2.13E+02

NSC PF2 6.17E+00 1.38E+02 2.36E+06 4.02E+03 4.29E+03 9.12E+02 8.12E-01 2.10E+01 2.88E+00 9.33E+01 2.37E+01 9.12E+03 2.47E+00 1.29E+01 3.23E+02 1.59E+02 1.46E+02 9.08E+02 9.10E+02 9.20E+02 5.00E+02 9.12E+02 5.42E+02 7.49E+02 2.13E+02

Self 1.60E-05 4.35E+01 2.42E+06 1.13E+04 4.60E+03 9.37E+01 3.42E-01 2.08E+01 1.99E-03 6.39E+01 2.76E+01 7.31E+03 1.37E+00 1.29E+01 3.25E+02 1.57E+02 1.32E+02 9.20E+02 9.14E+02 9.25E+02 5.65E+02 9.76E+02 5.34E+02 2.00E+02 4.83E+02

W. Self 0.00E+00 1.28E+02 2.67E+06 1.17E+04 4.88E+03 6.61E+02 3.87E-01 2.08E+01 1.47E-02 6.71E+01 2.94E+01 5.34E+03 1.23E+00 1.30E+01 3.10E+02 1.33E+02 1.65E+02 9.21E+02 9.21E+02 9.29E+02 5.00E+02 9.78E+02 5.65E+02 2.00E+02 5.63E+02

Constant 0.00E+00 6.55E-03 1.72E+06 6.78E+03 3.69E+03 8.77E+01 1.81E-02 2.09E+01 1.66E-03 7.93E+01 2.82E+01 3.01E+03 1.40E+00 1.31E+01 3.35E+02 1.11E+02 1.71E+02 9.09E+02 9.09E+02 9.19E+02 5.15E+02 9.33E+02 5.34E+02 2.00E+02 2.18E+02

• DE Exp ≡ Differential Evolution with Exponential Crossover. • DE Bin ≡ Differential Evolution with Binomial Crossover. • ES Inter ≡ Evolution Strategy with Intermediate Crossover. • ES Disc ≡ Evolution Strategy with Discrete Crossover. • fAvg PF1 ≡ Hybrid algorithm made up of the four aforementioned techniques, making use of the Fitness Average Quality Measure and the Dynamic Participation Function with constant population size. • fAvg PF2 ≡ Hybrid algorithm making use of the Fitness Average Quality Measure and the Dynamic Participation Function with variable population size. • NSC PF1 ≡ Hybrid algorithm making use of the Negative Slope Coefficient Quality Measure and the Dynamic Participation Function with constant population size. • NSC PF2 ≡ Hybrid algorithm making use of the Negative Slope Coefficient Quality Measure and the Dynamic Participation Function with variable population size. • Self ≡ Hybrid algorithm with participation ratios encoded in the chromosome of the individual with arithmetic average recombination of the participation information. • W. Self ≡ Hybrid algorithm with participation ratios encoded in the chromosome of the individual with weighted average recombination of the participation information. • Constant ≡ Hybrid algorithm with constant participation ratios.



B.3. CEC 2008 BENCHMARK

181

Table B.4: Full results in the CEC 2008 Benchmark in 1,000 dimensions

BCGM BCUM DE Bin DE Exp ES Disc ES Inter UCGM UCUM fAvg_PF1 fAvg_PF2 NSC_PF1 NSC_PF2 Self W. Self Constant


Sphere 2.83E+05 1.97E+05 5.43E+06 1.73E+02 1.52E-09 0.00E+00 1.26E+05 9.18E+04 2.35E-06 0.00E+00 1.17E-04 0.00E+00 2.55E-10 0.00E+00 0.00E+00

Schwefel 9.29E+01 8.94E+01 1.78E+02 1.51E+02 1.42E+02 1.01E+02 7.13E+01 6.84E+01 6.63E+01 7.62E+01 7.18E+01 6.46E+01 8.49E+01 9.35E+01 5.61E+01

Rosenbrock 1.00E+10 1.00E+10 1.00E+10 3.60E+05 1.13E+03 1.46E+03 1.00E+10 1.00E+10 1.04E+05 1.13E+03 7.84E+03 1.11E+03 1.50E+03 1.54E+03 1.46E+03

Rastrigin 8.02E+03 7.23E+03 2.32E+04 1.13E+03 1.91E+03 8.01E+03 1.71E+03 1.65E+03 2.60E+03 3.00E+03 3.07E+03 2.71E+03 2.10E+03 2.18E+03 7.08E+02

Griewank 2.50E+03 1.74E+03 4.86E+04 2.54E+00 1.37E-01 1.91E-03 1.13E+03 8.10E+02 3.55E-03 3.83E-09 4.32E-03 2.33E-03 1.24E-03 1.91E-02 1.46E-03

Ackley 1.46E+01 1.32E+01 2.15E+01 2.01E+01 1.58E+01 1.94E+01 1.16E+01 1.05E+01 1.10E+01 1.05E+01 4.67E+00 1.06E+01 1.36E+01 1.30E+01 9.25E+00


182




Bibliography

[AATU03]

A. Acan, H. Altincay, Y. Tekol, and A. Unveren. A genetic algorithm with multiple crossover operators for optimal frequency assignment problem. In Proceedings of the 5th IEEE Congress on Evolutionary Computation, CEC 2003, volume 1, pages 256–263. IEEE Press, December 2003.

[ABCC98]

D. Applegate, R. Bixby, V. Chvátal, and W. Cook. On the solution of traveling salesman problems. In Proceedings of the International Conference of Mathematicians, volume Extra Volume ICM III, pages 645–656, Berlin, August 1998.

[ACE06]

A. Agarwal, S. Colak, and E. Eryarsoy. Improvement heuristic for the flow shop scheduling problem: An adaptive-learning approach. European Journal of Operational Research, 169(3):801–815, March 2006.

[AFEG99]

S. M. Alaoui, O. Frieder, and T. El-Ghazawi. A parallel genetic algorithm for task mapping on parallel machines. In Proceedings of the 11th IPPS/SPDP’99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and the 10th Symposium on Parallel and Distributed Processing, volume 1586 of Lecture Notes in Computer Science, pages 201– 209. Springer-Verlag GmbH, 1999.

[AGD03]

A. Auyeung, I. Gondra, and H.K. Dai. Multi-heuristic list scheduling genetic algorithm for task scheduling. In Proceedings of the 2003 ACM symposium on Applied computing, SAC 2003, pages 721–724. ACM Press, 2003.

[AH05]

A. Auger and N. Hansen. A restart CMA evolution strategy with increasing population sizes. In Proceedings of the 7th IEEE Congress on Evolutionary Computation, CEC 2005, pages 1769– 1776. IEEE Press, 2005.

[AL08]

S. Abdallah and V. Lesser. A multiagent reinforcement learning algorithm with non-linear dynamics. Journal of Artificial Intelligence Research, 33:521–549, 2008.

[AMHV06]

A. Anagnostopoulos, L. Michel, P. Van Hentenryck, and Y. Vergados. A simulated annealing approach to the traveling tournament problem. Journal of Scheduling, 9(2):177–193, April 2006.



184

[AN04]

BIBLIOGRAPHY

A.C. Andreas and C. Nearchou. The effect of various operators on the genetic search for large scheduling problems. International Journal of Production Economics, 88(2):191–203, March 2004.

[Ang95]

P.J. Angeline. Adaptive and self-adaptive evolutionary computations. In M. Palaniswami, Y. Attikiouzel, R. Marks, D. Fogel, and T. Fukuda, editors, Computational Intelligence: A Dynamic Systems Perspective, pages 152–163, Piscataway, NJ, 1995. IEEE Press.

[Bäc95]

T. Bäck. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms. Oxford University Press, December 1995.

[Bal94]

S. Baluja. Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical report, Carnegie Mellon University, Pittsburgh, PA, USA, 1994.

[Ban90]

W. Banzhaf. The "molecular" traveling salesman. Biological Cybernetics, 64:7–14, 1990.

[Bar54]

N.A. Barricelli. Esempi numerici di processi di evoluzione. Methodos, pages 45–68, 1954.

[Bar57]

N.A. Barricelli. Symbiogenetic evolution processes realized by artificial methods. Methodos, pages 143–182, 1957.

[BB04]

S. Bertel and J.C. Billaut. A genetic algorithm for an industrial multiprocessor flow shop scheduling problem with recirculation. European Journal of Operational Research, 159(3):651– 662, December 2004.

[BD97]

S. Baluja and S. Davies. Combining multiple optimization runs with optimal dependency trees. Technical Report TR: CMU-CS-97-157, Justsystem Pittsburgh Research Center & Carnegie Mellon University, 1997.

[BDL+ 96]

H. Bersini, M. Dorigo, S. Langerman, G. Seront, and L. Gambardella. Results of the 1st international contest on evolutionary optimisation, 1st ICEO. In Proceedings of the 3rd IEEE International Conference on Evolutionary Computation, ICEC 1996, pages 611–615. IEEE Press, May 1996.

[BeS00]

H.J.C. Barbosa and A. Medeiros e Sá. On adaptive operator probabilities in real coded genetic algorithms. In Proceedings of the Workshop on Advances and Trends in Artificial Intelligence, Santiago, Chile, November 2000.

[BGD02]

L. Bianchi, L.M. Gambardella, and M. Dorigo. An ant colony optimization approach to the probabilistic traveling salesman problem. In Proceedings of the International Conference on Parallel Problem Solving from Nature, PPSN VII, Lecture Notes in Computer Science, pages 883–892. Springer-Verlag GmbH, September 2002.



BIBLIOGRAPHY

[Bie95]

185

C. Bierwirth. A generalized permutation approach to job shop scheduling with genetic algorithms. OR Spectrum, 17(2-3):87–92, June 1995.

[BIV97]

J.S. De Bonet, C.L. Isbell, and P. Viola. Structure driven image database retrieval. In M.C. Mozer, M.I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing, volume 9, page 424. MIT Press, 1997.

[BM99]

C. Bierwirth and D.C. Mattfeld. Production scheduling and rescheduling with genetic algorithms. Evolutionary Computation, 7(1):1–17, 1999.

[Bow05]

M. Bowling.

Convergence and no-regret in multiagent learning.

In Advances in Neural

Information Processing Systems, NIPS 2005, volume 17, pages 209–216. MIT Press, 2005. [BP95]

G. Bilchev and I.C. Parmee. The ant colony metaphor for searching continuous design spaces. In Selected Papers from AISB Workshop on Evolutionary Computing, volume 993 of Lecture Notes in Computer Science, pages 25–39. Springer-Verlag GmbH, 1995.

[Bra85]

R.M. Brady. Optimization strategies gleaned from biological evolution. Nature, 317(6040):804– 806, October 1985.

[BS02]

H.G. Beyer and H.P. Schwefel. Evolution strategies - a comprehensive introduction. Natural Computing, 1(1):3–52, March 2002.

[BSCG05]

P.J. Ballester, J. Stephenson, J.N. Carter, and K. Gallagher. Real-parameter optimization performance study on the CEC-2005 benchmark with SPC-PNX. In Proceedings of the 7th IEEE Congress on Evolutionary Computation, CEC 2005, volume 1, pages 498–505. IEEE Press, September 2005.

[BT95]

T. Blickle and L. Thiele. A comparison of selection schemes used in genetic algorithms. Technical report, Swiss Federal Institute of Technology (ETH) Zurich, Computer Engineering and Communications Networks Lab (TIK), Gloriastrasse 35, CH-8092 Zurich, 1995.

[Bur69]

G.H. Burgin. On playing two-person zero-sum games against nonminimax players. IEEE Transactions on Systems Science and Cybernetics, 5(4):369–370, October 1969.

[BV98]

E. Balas and A. Vazacopoulos. Guided local search with shifting bottleneck for job shop scheduling. Management Science, 44(2):262–275, February 1998.

[BV01a]

M. Bowling and M. Veloso. Convergence of gradient dynamics with a variable learning rate. In Proceedings of the 18th International Conference on Machine Learning, ICML 2001, pages 27–34. Morgan Kaufmann, 2001.



186

[BV01b]

BIBLIOGRAPHY

M. Bowling and M. Veloso.

Rational and convergent learning in stochastic games.

In

Proceedings of the 17th International Joint Conference on Artificial Intelligence, IJCAI 2001, pages 1021–1026, August 2001. [BV02]

M. Bowling and M. Veloso. Multiagent learning using a variable learning rate. Articial Intelligence, 136(2):215–250, 2002.

[CADV02]

S.H. Chiang, A. Arpaci-Dusseau, and M.K. Vernon. The impact of more accurate requested runtimes on production job scheduling performance. In Proceedings of the 8th International Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2002, volume 2537 of Lecture Notes in Computer Science, pages 103–127. Springer-Verlag GmbH, 2002.

[Cer85]

V. Cerny. Thermodynamical approach to the travelling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 45(1):41–51, January 1985.

[CLKT05]

C. Chan, S. Lee, C. Kao, and H. Tsai. Improving EAX with restricted 2-opt. In Proceedings of the 7th Genetic and Evolutionary Computation Conference, GECCO 2005, pages 1471–1476, New York, NY, USA, 2005. ACM Press.

[CM98]

N. Carrasquero and J.A. Moreno. A new genetic operator for the traveling salesman problem. In Proceedings of the 6th Ibero-American Conference on Artificial Intelligence, IBERAMIA 98, volume 1484 of Lecture Notes in Computer Science, pages 465–475, Lisbon, 1998. SpringerVerlag GmbH.

[Cor72]

F.N. Cornett. An application of evolutionary programming to pattern recognition. Master’s thesis, New Mexico State University, Las Cruces, NM, 1972.

[CP95]

M. Coli and P. Palazzari. Searching for the optimal coding in genetic algorithms. In Proceedings of the 2nd IEEE International Conference on Evolutionary Computation, ICEC 1995, volume 1, pages 92–96. IEEE Press, November-1 December 1995.

[CP98]

E. Cantú-Paz. A survey of parallel genetic algorithms. Calculateurs Paralleles, Réseaux et Systems Répartis, 10(2):141–171, 1998.

[Cra85]

N.L. Cramer. A representation for the adaptive generation of simple sequential programs. In J.J. Grefenstette, editor, Proceedings of the 1985 International Conference on Genetic Algorithms and the Applications, ICGA 1985, pages 183–187, Carnegie Mellon University, July 1985.

[Cro58]

G.A. Croes.

A method for solving travelling salesman problems.

Operations Research,

6(6):791–812, November-December 1958. [Cro73]

J.L. Crosby. Computer Simulation in Genetics. John-Wiley and Sons, Ltd., 1973.



BIBLIOGRAPHY

[CS88]

187

R. Caruana and J.D. Schaffer. Representation and hidden bias: Gray vs. binary coding for genetic algorithms. In Proceedings of the 5th International Conference on Machine Learning, ICML 1998, pages 153–161, 1988.

[CS03]

V. Conitzer and T. Sandholm. BL-WoLF: A framework for loss-bounded learnability in zerosum games. In Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pages 91–98, 2003.

[Dav85a]

L. Davis. Applying adaptive algorithms to epistatic domains. In Proceedings of the 9th International Joint Conference on Artificial Intelligence, IJCAI 1985, pages 162–164, 1985.

[Dav85b]

L. Davis. Job shop scheduling with genetic algorithms. In Proceedings of the 1st International Conference on Genetic Algorithms, ICGA 1985, pages 136–140. Lawrence Erlbaum Associates, 1985.

[Dav89]

L. Davis. Adapting operator probabilities in genetic algorithms. In Proceedings of the 3rd International Conference on Genetic Algorithms, ICGA 1989, pages 61–69, San Francisco, CA, USA, 1989. Morgan Kaufmann.

[DD98]

G. DiCaro and M. Dorigo. Antnet: Distributed stigmergetic control for communications networks. Journal of Artificial Intelligence Research, 9:317–365, 1998.

[DG97]

M. Dorigo and L.M. Gambardella. Ant colony system: A cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1(1):53–66, April 1997.

[Día05]

P. Díaz. Diseño e implementación de una librería de algoritmos evolutivos paralelos. Master’s thesis, Facultad de Informática, Universidad Politécnica de Madrid, November 2005.

[DJW02]

S. Droste, T. Jansen, and I. Wegener. Optimization with randomized search heuristics - the (A)NFL theorem, realistic scenarios, and difficult functions. Theoretical Computer Science, 287(1):131–144, 2002.

[DKZ97]

U. Derigs, M. Kabath, and M. Zils. Adaptive genetic algorithms: A methodology for dynamic autoconfiguration of genetic search algorithms. In Proceedings of the Metheuristic International Conference, MIC’97, Sophia Antipolis, France, July 1997.

[Dor92]

M. Dorigo. Optimization, Learning and Natural Algorithms. PhD thesis, Politecnico di Milano, 1992.

[EDM04]

E. Even-Dar and Y. Mansour. Learning rates for Q-Learning. Journal of Machine Learning Research, 5:1–25, 2004.



188

[EHKS07]

BIBLIOGRAPHY

A.E. Eiben, M. Horvath, W. Kowalczyk, and M.C. Schut. Reinforcement learning for online control of evolutionary algorithms. In Proceedings of the 4th International Workshop on Engineering Self-Organising Systems, ESOA 2006, volume 4335 of Lecture Notes in Computer Science, pages 151–160. Springer-Verlag GmbH, 2007.

[EHM99]

A.E. Eiben, R. Hinterding, and Z. Michalewicz. Parameter control in evolutionary algorithms. IEEE Transactions on Evolutionary Computation, 3(2):124–141, July 1999.

[Eib09]

A. E. Eiben. Principled approaches to tuning EA parameters. Tutorial at the 11th IEEE Congress on Evolutionary Computation, CEC 2009, 2009.

[ELZ+ 08]

M.T.M. Emmerich, R. Li, A. Zhang, I. Flesch, and P. Lucas. Mixed-integer bayesian optimization utilizing a-priori knowledge on parameter dependences.

In Proceedings of the

20th Belgian-Netherlands Conference on Artificial Intelligence, BNAIC 2008, Enschede, The Netherlands, 2008. [ES93]

L.J. Eshelman and J.D. Schaffer. Real-coded genetic algorithms and interval-schemata. In L.D. Whitley, editor, Proceedings of the 2nd Workshop on Foundations of Genetic Algorithms, FOGA 1993. Morgan Kaufmann, July 1993.

[ES98]

A.E. Eiben and C.A. Schippers. On evolutionary exploration and exploitation. Fundamenta Informaticae, 35(1-4):35–50, August 1998. IOS Press.

[ESKT98]

A.E. Eiben, I.G. Sprinkhuizen-Kuyper, and B.A. Thijssen. Competing crossovers in an adaptive GA framework. In Proceedings of the 5th IEEE International Conference on Evolutionary Computation, ICEC 1998, pages 787–792, Anchorage, AK, USA, 1998. IEEE Press.

[FB70]

A.S. Fraser and D. Burnell. Computer Models in Genetics. McGraw Hill, 1970.

[FF89]

D.B. Fogel and L.J. Fogel. Evolutionary programming for voice feature analysis. In Proceedings of the 23rd Asilomar Conference on Signals, Systems and Computers, pages 381–383. IEEE Press, 1989.

[FFA91]

D.B. Fogel, L.J. Fogel, and W. Atmar. Meta-evolutionary programming. In 1991 Conference Record of the 25th Asilomar Conference on Signals, Systems and Computers, volume 1, pages 540–545. IEEE Press, November 1991.

[FFAF92]

D.B. Fogel, L.J. Fogel, W. Atmar, and G.B. Fogel. Hierarchic methods of evolutionary programming. In D.B. Fogel and W. Atmar, editors, Proceedings of the 1st Conference on Evolutionary Programming, pages 175–182, La Jolla, CA, 1992.

[Fle04]

K. Fleetwood. An introduction to differential evolution. Talk at the Multi-Agent Systems and Machine Learning Symposium at The University of Queensland, November 2004.



BIBLIOGRAPHY

[FM90]

189

B.R. Fox and M.B. McMahon. Genetic operators for sequencing problems. In G.J.E. Rawlins, editor, Proceedings of the 1990 Foundations of Genetic Algorithms Conference, FOGA 1990, July 15-18, 1990. Morgan Kaufmann.

[Fog62]

L.J. Fogel. Autonomous automata. Industrial Research Magazine, 4(2):14–19, February 1962.

[Fog64]

L.J. Fogel. On the Organization of Intellect. PhD thesis, UCLA, 1964.

[Fog88]

D.B. Fogel.

An evolutionary approach to the traveling salesman problem.

Biological

Cybernetics, 60(2):139–144, December 1988. [Fog91]

D.B. Fogel. System Identification through Simulated Evolution: A Machine Learning Approach to Modeling. Ginn Press, 1991.

[For81]

R. Forsyth. Beagle: A darwinian approach to pattern recognition. Kybernetes, 10:159–166, 1981.

[Fos01]

J.A. Foster. Computational genetics: Evolutionary computation. Nature Reviews Genetics, 2:428–436, June 2001.

[FR95]

T.A. Feo and M.G.C. Resende. Greedy randomized adaptive search procedures. Journal of Global Optimization, 6(2):109–133, March 1995.

[Fra57]

A.S. Fraser. Simulation of genetic systems by automatic digital computers. I. Introduction. Australian Journal of Biological Science, 10:484–491, 1957.

[FRS04]

D.G. Feitelson, L. Rudolph, and U. Schwiegelshohn. Parallel job scheduling – a status report. In Proceedings of the 10th International Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2004, volume 3277 of Lecture Notes in Computer Science, pages 1–16. Springer-Verlag GmbH, 2004.

[GA07]

C. Grosan and A. Abraham. Hybrid evolutionary algorithms: Methodologies, architectures, and reviews. In C. Grosan, A. Abraham, and H. Ishibuchi, editors, Hybrid Evolutionary Algorithms, volume 75 of Studies in Computational Intelligence, pages 1–17. Springer-Verlag GmbH, 2007.

[Gai04]

Z.L. Gaing. A particle swarm optimization approach for optimum design of PID controller in AVR system. IEEE Transactions on Energy Conversion, 19(2):384–391, June 2004.

[GGM+ 04]

E.A. Grimaldi, F. Grimaccia, M. Mussetta, P. Pirinoli, and R.F. Zich. A new hybrid geneticalswarm algorithm for electromagnetic optimization. In Proceedings of the 3rd International Conference on Computational Electromagnetics and Its Applications, ICCEA 2004, pages 157– 160. IEEE Press, November 2004.



190

[GGRG85]

BIBLIOGRAPHY

J.J. Grefenstette, R. Gopal, B.J. Rosmaita, and D.V. Gucht. Genetic algorithms for the traveling salesman problem. In Proceedings of the 1st International Conference on Genetic Algorithms, ICGA 1985, pages 160–168. Lawrence Erlbaum Associates, 1985.

[GJ85]

D.E. Goldberg and R. Lingle Jr.

Alleles, loci and the TSP. In J.J. Grefenstette, editor,

Proceedings of the 1st International Conference on Genetic Algorithms and their Applications, ICGA 1985, pages 154–159, Mahwah, New Jersey, July 1985. Lawrence Erlbaum Associates. [GL97]

F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, 1997.

[Glo86]

F. Glover. Future paths for integer programming and links to artificial intelligence. Computers & Operations Research, 13(5):533–549, 1986.

[GMLH08]

S. García, D. Molina, M. Lozano, and F. Herrera. A study on the use of non-parametric tests for analyzing the evolutionary algorithms’ behaviour: A case study on the CEC’2005 special session on real parameter optimization (in press). Journal of Heuristics, 2008.

[GP05]

K. Ganesh and M. Punniyamoorthy. Optimization of continuous-time production planning using hybrid genetic algorithms-simulated annealing.

International Journal of Advanced

Manufacturing Technology, 26(1-2):148–154, July 2005. [Gre86]

J.J. Grefenstette. Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics, 16(1):122–128, January 1986.

[Gre87]

J.J. Grefenstette.

Incorporating problem specific knowledge into genetic algorithms.

In

L. Davis, editor, Genetic Algorithms and Simulated Annealing, pages 42–60. Morgan Kaufmann, 1987. [GS07]

L.D. Gaspero and A. Schaerf. A composite-neighborhood tabu search approach to the traveling tournament problem. Journal of Heuristics, 13(2):189–207, April 2007.

[Gut00]

W.J. Gutjahr. A graph-based ant system and its convergence. Future Generation Computer Systems, 16(8):873–88, 2000.

[GW04]

J. Grabowski and M. Wodecki. A very fast tabu search algorithm for the permutation flow shop problem with makespan criterion. Computers & Operations Research, 31(11):1891–1909, September 2004.

[Hal00]

S.N. Hall. Group theoretic tabu search approach to the traveling salesman problem. Master’s thesis, Air Force Institute of Technology, 2000.

[Hei08]

Ruprecht-Karls-Universität Heidelberg. TSPLIB: A library of sample instances for the TSP (and related problems), accessed Aug 2009 [online]. Available: http://elib.zib.de/ pub/Packages/mp-testdata/tsp/tsplib/tsplib.html, August 2008.



BIBLIOGRAPHY

[Hel00]

191

K. Helsgaun. An effective implementation of the Lin-Kernighan traveling salesman heuristics. European Journal of Operational Research, 126(1):106–130, October 2000.

[HH01]

W.W. Hsu and C.C. Hsu. The spontaneous evolution genetic algorithm for solving the traveling salesman problem. In L. Spector, E.D. Goodman, A. Wu, W.B. Langdon, H.M. Voigt, M. Gen, S. Sen, M. Dorigo, S. Pezeshk, M.H. Garzon, and E. Burke, editors, Proceedings of the 3rd Genetic and Evolutionary Computation Conference, GECCO 2001, pages 359–366. Morgan Kaufmann, 2001.

[HKM95]

I. Hong, A.B. Kahng, and B.R. Moon. Exploiting synergies of multiple crossovers: Initial studies. In Proceedings of the 1995 IEEE International Conference on Evolutionary Computation, ICEC 1995, volume 1, pages 245–250, Perth, WA, Australia, 1995. IEEE Press.

[HL96]

F. Herrera and M. Lozano. Adaptation of genetic algorithm parameters based on fuzzy logic controllers. In F. Herrera and J.L. Verdegay, editors, Genetic Algorithms and Soft Computing, pages 95–125. Physica-Verlag, 1996.

[HLG99]

G.R. Harik, F.G. Lobo, and D.E. Goldberg. The compact genetic algorithm. IEEE Transactions on Evolutionary Computation, 3(3):287–297, November 1999.

[Hol75]

J.H. Holland. Adaptation in natural and artificial systems. University of Michigan Press, 1975.

[Hol79]

S. Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6:65–70, 1979.

[HV06]

P.V. Hentenryck and Y. Vergados. Traveling tournament scheduling: A systematic evaluation of simulated annealling. In Proceedings of the 3rd International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, CPAIOR 2006, volume 3990 of Lecture Notes in Computer Science, pages 228–243. SpringerVerlag GmbH, May 31 - June 2 2006.

[HW98]

T.P. Hong and H.S. Wang. Automatically adjusting crossover ratios of multiple crossover operattors. Journal of Information Science and Engineering, 14(2):369–390, June 1998.

[HWC00]

T.P. Hong, H.S. Wang, and W.C. Chen. Simultaneously applying multiple mutation operators in genetic algorithms. Journal of Heuristics, 6(4):439–455, September 2000. Kluwer Academic Publishers, Hingham, MA, USA.

[HWLL02]

T.P. Hong, H.S. Wang, W.Y. Lin, and W.Y. Lee. Evolution of appropriate crossover and mutation operators in a genetic process. Applied Intelligence, 16(1):7–17, January 2002. Springer Netherlands.



192

[JF95]

BIBLIOGRAPHY

T. Jones and S. Forrest. Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In Larry Eshelman, editor, Proceedings of the 6th International Conference on Genetic Algorithms, ICGA 1995, pages 184–192, San Francisco, CA, 1995. Morgan Kaufmann.

[JJS94]

T. Jaakkola, M.I. Jordan, and S.P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185–1201, 1994.

[JM02]

S. Jung and B.R. Moon. Toward mminimal restriction of genetic encoding and crossovers for the two-dimensional euclidean TSP. IEEE Transactions on Evolutionary Computation, 6(6):557– 565, December 2002.

[JRS07]

N. Jin and Y. Rahmat-Samii. Advances in particle swarm optimization for antenna designs: Real-number, binary, single-objective and multiobjective implementations. IEEE Transactions on Antennas and Propagation, 55(3):556–567, March 2007.

[Jul95]

B. Julstrom. What have you done for me lately? Adapting operator probabilities in a steady-state genetic algorithms. In L.J. Eshelman, editor, Proceedings of the 6th International Conference on Genetic Algorithms, ICGA 1995, pages 81–87, San Francisco, CA, USA, 1995. Morgan Kaufmann.

[Jul97]

B. Julstrom. Adaptive operator probabilities in a genetic algorithm that applies three operators. In Proceedings of the 1997 ACM Symposium on Applied Computing, SAC 1997, pages 233– 238, San Jose, CA, USA, 1997. ACM Press.

[JZJ00]

L. Juan, C. Zixing, and L. Jianqin. Premature convergence in genetic algorithm: Analysis and prevention based on chaos operator. In Proceedings of the 3rd World Congress on Intelligent Control and Automation, volume 1, pages 495–499. IEEE Press, 2000.

[KA99]

Y.K. Kwok and I. Ahmad. Static scheduling algorithms for allocating directed graphs to multiprocessors. ACM Computing Surveys, 31(4):406–471, 1999.

[Kau67]

H. Kaufman. An experimental investigation of process identification by competitive evolution. IEEE Transactions on Systems Science and Cybernetics, 3(1):11–16, 1967.

[KDAB06]

E.E. Korkmaz, J. Du, R. Alhajj, and K. Barker. Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering. Intelligent Data Analysis, 10(2):163–182, March 2006.

[KES01]

J. Kennedy, R.C. Eberhart, and Y. Shi. Swarm Intelligence. Morgan Kaufmann, 2001.

[KGV83]

S. Kirkpatrik, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, May 1983.



BIBLIOGRAPHY

[KKS+ 03]

193

J.R. Koza, M.A. Keane, M.J. Streeter, W. Mydlowec, J. Yu, and G. Lanza. Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Academic Publishers, 2003.

[KLM96]

L.P. Kaelbling, M.L. Littman, and A.W. Moore. Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4:237–285, 1996.

[KN99]

K. Katayama and H. Narihisa. Irerated local search approach using genetic transformation to the traveling salesman problem. In W. Banzhaf et al., editor, Proceedings of the 1st Genetic and Evolutionary Computation Conference, GECCO 1999, pages 321–328. Morgan Kaufmann, 1999.

[Kno94]

J. Knox. Tabu search performance on the symmetric traveling salesman problem. Computers & Operations Research, 21(8):867–876, October 1994.

[Kok05]

Z. Kokosiński. Effects of versatile crossover and mutation operators on evolutionary search in partition and permutation problems. In Proceedings of the International Conference on Intelligent Information Processing and Web Mining, IIPWM 2005, volume 31 of Advances in Soft Computing, pages 299–308. Springer-Verlag GmbH, June 13–16 2005.

[Koz92]

J.R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, December 1992.

[KRM+ 01]

S. Kannan, M. Roberts, P. Mayers, D. Brelsford, and J.F. Skovira. Workload Management with LoadLeveler. IBM Red Books, 2001.

[KS96]

Leila Kallel and Marc Schoenauer. Fitness distance correlation for variable length representations. Technical report 363 cmap, École Polytechnique, 1996.

[KS00]

N. Krasnogor and J. Smith. A memetic algorithm with self-adaptive local search: TSP as a case study. In D. Whitley, D. Goldberg, E. Cantú-Paz, L. Spector, I. Parmee, and H.G. Beyer, editors, Proceedings of the 2nd Genetic and Evolutionary Computation Conference, GECCO 2000. Morgan Kaufmann, 2000.

[LC03]

Y.H. Lee and C. Chen. A modified genetic algorithm for task scheduling in multiprocessor systems. In Proceedings of the 9th Workshop on Compiler Techniques for High-performance Computing, 2003.

[LELP00a]

P. Larrañaga, R. Etxeberria, J.A. Lozano, and J.M. Peña. Combinatorial optimization by learning and simulation of bayesian networks. In Proceedings of the 16th Conference on Uncertainty in Articial Intelligence, UAI 2000, pages 343–352. Morgan Kaufmann, 2000.

[LELP00b]

P. Larrañaga, R. Etxeberria, J.A. Lozano, and J.M. Peña. Optimization in continuous domains by learning and simulation of gaussian networks. In Proceedings of the 2th Genetic



194

BIBLIOGRAPHY

and Evolutionary Computation Conference, GECCO 2000, pages 201–204. Morgan Kaufmann, 2000. [LGX97]

Y. Leung, Y. Gao, and Z.B. Xu. Degree of population diversity - a perspective on premature convergence in genetic algorithms and its Markov chain analysis. IEEE Transactions on Neural Networks, 8(5):1165–1176, September 1997.

[Lid91]

M.L. Lidd. The travelling salesman problem domain application of a fundamentally new approach to utilizing genetic algorithms. Technical report, MITRE Corporation, 1991.

[Lif95]

D.A. Lifka. The ANL/IBM SP scheduling system. In Proceedings of the 1995 Workshop on Job Scheduling Strategies for Parallel Processing, IPPS 1995, volume 949 of Lecture Notes in Computer Science, pages 295–303. Springer-Verlag GmbH, 1995.

[Lin65]

S. Lin. Computer solutions on the travelling salesman problem. Bell System Technical Journal, 44:2245–2269, 1965.

[LK73]

S. Lin and B. Kernighan. An efficient heuristic procedure for the traveling salesman problem. Operations Research, 21:498–516, 1973.

[LKC+ 07]

G. Lin, L. Kang, Y. Chen, B. McKay, and R. Sarker. A self-adaptive mutations with multi-parent crossover evolutionary algorithm for solving function optimization problems. In L. Kang and Y. Liuand S. Zeng, editors, Advances in Computation and Intelligence: Proceedings of the 2nd International Symposium, ISICA 2007, volume 4683/2007 of Lectures Notes in Computer Science, pages 157–168, Wuhan, China, September 2007. Springer-Verlag GmbH.

[LKM+ 99]

P. Larrañaga, C.M.H. Kuijpers, R.H. Murga, I. Inza, and S. Dizdarevic. Genetic algorithms for the travelling salesman problem: A review of representations and operators. Articial Intelligence Review, 13:129–170, 1999.

[LKPM97]

P. Larrañaga, C.M.H. Kuijpers, M. Poza, and R.H. Murga. Decomposing bayesian networks: Triangulation of the moral graph with genetic algorithms. Statistics and Computing, 7(1):19–34, March 1997.

[LL01]

P. Larrañaga and J.A. Lozano.

Estimation of Distribution Algorithms: A New Tool for

Evolutionary Computation, volume 2 of Genetic Algorithms and Evolutionary Computation. Kluwer Academic Publishers, 2001. [LLL06]

J.H. Lee, Y.H. Lee, and Y.H. Lee. Mathematical modeling and tabu search heuristic for the traveling tournament problem. In Proceedings of the 2006 International Conference on Computational Science and Its Applications, ICCSA 2006, volume 3982 of Lecture Notes in Computer Science, pages 875–884. Springer-Verlag GmbH, May 8-11 2006.



BIBLIOGRAPHY

[LMB07]

195

S. Liu, M. Mernik, and B.R. Bryant. Entropy-driven parameter control for evolutionary algorithms. Informatica: An International Journal of Computing and Informatics, 31(1):41–50, 2007.

[LPDG06]

A. Luntala, W.L. Price, M. Diaby, and M. Gravel. Ensuring population diversity in genetic algorithms: A technical note with application to the cell formation problem. European Journal of Operational Research, 178(2):634–638, 2006.

[LPFM09]

A. LaTorre, J.M. Peña, J. Fernández, and S. Muelas. MOS como herramienta para la hibridación de algoritmos evolutivos. In Proceedings del VI Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados, MAEB 2009, pages 457–464, 2009.

[LPMF10]

A. LaTorre, J.M. Peña, S. Muelas, and A.A. Freitas. Learning hybridization strategies in evolutionary algorithms. Intelligent Data Analysis, 14(3), 2010.

[LPMZ09]

A. LaTorre, J.M. Peña, S. Muelas, and M. Zaforas. Hybrid evolutionary algorithms for large scale continuous problems. In Proceedings of the 11th Genetic and Evolutionary Computation Conference, GECCO 2009, pages 1863–1865. ACM Press, 2009.

[LPQ97]

G. Laporte, J.Y. Potvin, and F. Quilleret. A tabu search heuristic using genetic diversification for the clustered traveling salesman problem. Journal of Heuristics, 2(3):187–200, December 1997.

[LPRdM08]

A. LaTorre, J.M. Peña, V. Robles, and P. de Miguel. Supercomputer scheduling with combined evolutionary techniques. In F. Xhafa and A. Abraham, editors, Meta-heuristics for Scheduling: Distributed Computing Environments, volume 146 of Studies in Computational Intelligence, pages 95–120. Springer-Verlag GmbH, Germany, 2008.

[LPRM08]

A. LaTorre, J.M. Peña, V. Robles, and S. Muelas. Using Multiple Offspring Sampling to guide genetic algorithms to solve permutation problems. In M. Keijzer, editor, Proceedings of the 10th Genetic and Evolutionary Computation Conference, GECCO 2008, pages 1119–1120, New York, NY, USA, July 2008. ACM Press.

[Mat96]

D.C. Mattfeld. Evolutionary Search and the Job Shop: Investigations on Genetic Algorithms for Production Scheduling (Production and Logistics). Physica-Verlag, 1996.

[MBP05]

P.P. Menon, D.G. Bates, and I. Postlethwaite. Hybrid evolutionary optimisation methods for the clearance of nonlinear flight control laws. In Proceedings of the 44th IEEE Conference on Decision and Control and the 2005 European Control Conference, CDC-ECC 2005, pages 4053–4058, December 2005.



196

[MCZ00]

BIBLIOGRAPHY

M. Mernik, M. Crepinsek, and V. Zumer. A metaevolutionary approach in searching of the best combination of crossover operators for the TSP. In Proceedings of the IASTED ICNN, pages 32–36, Pittsburg, Pennsylvania, 2000. IASTED/ACTA Press.

[Mer02]

P. Merz. A comparison of memetic recombination operators for the traveling salesman problem. In Proceedings of the 4th Genetic and Evolutionary Computation Conference, GECCO 2002, pages 472–479. Morgan Kaufmann, 2002.

[MF97]

P. Merz and B. Freisleben. Genetic local search for the TSP: New results. In Proceedings of the 4th IEEE International Conference on Evolutionary Computation, ICEC 1997, pages 159–164. IEEE Press, April 1997.

[MF01a]

P. Merz and B. Freisleben. Memetic algorithms for the traveling salesman problem. Complex Systems, 13(4):297–345, 2001.

[MF01b]

A.W. Mu’alem and D.G. Feitelson. Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Transactions on Parallel and Distributed Systems, 12(6):529–543, June 2001.

[MGSK88]

H. Mühlenbein, M. Gorges-Schleuter, and O. Krämer. Evolutionary algorithms in combinatorial optimization. Parallel Computing, 7:65–85, 1988.

[MH97]

N. Mladenovic and P. Hansen.

Variable neighborhood search.

Computers & Operations

Research, 24(11):1097–1100, 1997. [MHMG05]

A.C.M. Martínez-Estudillo, C. Hervás-Martínez, F.J. Martínez-Estudillo, and N. GarcíaPedrajas. Hybridization of evolutionary algorithms and local search by means of a clustering method. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 36(3):534–545, June 2005.

[Mic96]

Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag GmbH, 1996.

[MN92]

P. Moscato and M.G. Norman. A memetic approach for the traveling salesman problem implementation of a computational ecology for combinatorial optimization on message-passing systems. In Proceedings of the 1992 Internacional Conference on Parallel Computing and Transputer Applications, PACTA 1992, pages 177–186. IOS Press, 1992.

[Mos89]

P. Moscato. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Technical Report 826, Caltech Concurrent Computation Program, CalTech, Pasadena CA, 1989.



BIBLIOGRAPHY

[MPR+ 07]

197

S. Muelas, J.M. Peña, V. Robles, A. LaTorre, and P. de Miguel. Machine learning methods to analyze migration parameters in parallel genetic algorithms. In E. Corchado, J.M. Corchado, and A. Abraham, editors, Proceedings of the International Workshop on Hybrid Artificial Intelligence Systems 2007, volume 44 of Advances in Soft Computing, pages 199–206, Salamanca, Spain, November 2007. Springer Verlag.

[Müh89]

H. Mühlenbein. Parallel genetic algorithm, population dynamics and combinatorial optimization.

In H. Schaffer, editor, Proceedings of the 3rd International Conference on Genetic

Algorithms, ICGA 1989. Morgan Kaufmann, 1989. [Müh97]

H. Mühlenbein. The equation for response to selection and its use for prediction. Evolutionary Computation, 5(3):303–346, 1997.

[Nar04]

A. Nareyek.

Choosing search heuristics by non-stationary reinforcement learning.

In

Metaheuristics: Computer Decision-Making, pages 523–544. Kluwer Academic Publishers, 2004. [NK97]

Y. Nagata and S. Kobayashi. Edge assembly crossover: A high-power genetic algorithm for the travelling salesman problem. In Thomas Bäck, editor, Proceedings of the 7th International Conference on Genetic Algorithms, ICGA 1997, pages 450–457. Morgan Kaufmann, 1997.

[NS96]

E. Nowicki and C. Smutnicki.

A fast taboo search algorithm for the job shop problem.

Management Science, 42(6):797–813, 1996. [NT95]

K.S. Naphade and D. Tuzun. Initializing the hopfield-tank network for the TSP using a convex hull: A computational study. In Proceedings of the 1995 Artificial Neural Networks in Engineering Conference, ANNIE 1995, volume 5, pages 399–404, St. Louis, November 1995.

[Olt04]

M. Oltean. Searching for a practical evidence of the no free lunch theorems. In Proceedings of the 1st International Workshop on Biologically Inspired Approaches to Advanced Information Technology, BioADIT 2004, volume 3141 of Lecture Notes in Computer Science, pages 472– 483. Springer-Verlag GmbH, 2004.

[Or76]

I. Or. Travelling Salesman-Type Combinatorial Problems and Their Relation to the Logistics of Regional Blood Banking. PhD thesis, Northwestern University, 1976.

[OS90]

F.A. Ogbu and D.K. Smith. The application of the simulated annealing algorithm to the solution of the n/m/Cmax flow shop problems. Computers & Operations Research, 17(3):243–253, 1990.

[OSH87]

I.M. Oliver, D.J. Smith, and J.R.C. Holland. A study of permutation crossover operators on the traveling salesman problem. In Proceedings of the 2nd International Conference on Genetic Algorithms and their Application, ICGA 1987, pages 224–230, Mahwah, NJ, USA, 1987. Lawrence Erlbaum Associates, Inc.



198

[Pea88]

BIBLIOGRAPHY

J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, CA, USA, 1988.

[PGCP99]

M. Pelikan, D.E. Goldberg, and E. Cantú-Paz. BOA: The bayesian optimization algorithms. In W. Banzhaf, J. Daida, A.E. Eiben, M.H. Garzon, V. Honavar, M. Jakiela, and R.E. Smith, editors, Proceedings of the 1st Genetic and Evolutionary Computation Conference, GECCO 1999, volume 1, pages 525–532, Orlando, FL, 1999. Morgan Kaufmann.

[Pis95]

D. Pisinger. Algorithms for Knapsack Problems. PhD thesis, Department of Computer Science, University of Aarhus Denmark (DIKU), 1995.

[PLPO09]

L. Peña, A. LaTorre, J.M. Peña, and S. Ossowski. Tentative exploration on reinforcement learning algorithms for stochastic rewards. In E. Corchado, editor, Proceedings of the 4th International Conference on Hybrid Artificial Intelligent Systems, HAIS 2009, volume 5572 of Lecture Notes in Artificial Intelligence, pages 336–343, Berlin, June 2009. Springer-Verlag GmbH.

[PM99]

M. Pelikan and H. Mühlenbein. The bivariate marginal distribution algorithm. In R. Roy, T. Furuhashi, and P.K. Chawdhry, editors, Proceedings of Advances in Soft Computing - Engineering Design and Manufacturing, pages 521–535, London, 1999. Springer-Verlag GmbH.

[PMMAM09] C. Pérez-Miguel, J. Miguel-Alonso, and A. Mendiburu. Evaluating the cell broadband engine as a platform to run estimation of distribution algorithms. In Proceedings of the 11th Genetic and Evolutionary Computation Conference, GECCO 2009, pages 2491–2498. ACM Press, 2009. [Pol08]

R. Poli. Analysis of the publications on the applications of particle swarm optimisation. Journal of Artificial Evolution and Applications, 8(3):10, 2008.

[PTL08]

Q.K. Pan, M.F. Tasgetiren, and Y.C. Liang. A discrete particle swarm optimization algorithm for the no-wait flowshop scheduling problem. Computers & Operations Research, 35(9):2807– 2839, 2008. Part Special Issue: Bio-inspired Methods in Combinatorial Optimization.

[PV07]

R. Poli and L. Vanneschi. Fitness-proportional negative slope coefficient as a hardness measure for genetic algorithms. In Proceedings of the 9th Genetic and Evolutionary Computation Conference, GECCO 2007, pages 1335–1342, New York, NY, USA, 2007. ACM Press.

[RdML02]

V. Robles, P. de Miguel, and P. Larrañaga. Solving the traveling saleman problem with EDAs. In P. Larrañaga and J.A. Lozano, editors, Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation, pages 211–230. Kluwer Academic Publishers, 2002.

[Rec71]

I. Rechenberg. Evolutionsstrategie - Optimierung Technischer Systeme nach Prinzipien der Biologischen Evolution. PhD thesis, Technischen Universität Berlin, 1971.



BIBLIOGRAPHY

[RKP05]

199

J. Ronkkonen, S. Kukkonen, and K.V. Price. Real-parameter optimization with differential evolution. In Proceedings of the 7th IEEE Congress on Evolutionary Computation, CEC 2005, volume 1, pages 506–513. IEEE Press, September 2005.

[Rob49]

J.B. Robinson. On the hamiltonian game (a traveling-salesman problem). RAND Research Memorandum RM-303, 1949.

[Roo70]

R. Root. An investigation of evolutionary programming. Master’s thesis, New Mexico State University, Las Cruces, NM, 1970.

[Ros95]

J.P. Rosca. Entropy-driven adaptive representation. In Proceedings of the Workshop on Genetic Programming: From Theory to Real-World Applications, pages 23–32. Morgan Kaufmann, 1995.

[RPL+ 04]

V. Robles, J.M. Peña, P. Larrañaga, M.S. Pérez, and V. Herves. GA-EDA: A new hybrid cooperative search evolutionary algorithms. In J.A. Lozano, P. Larrañaga, I. Inza, and E. Bengotxea, editors, Towards a New Evolutionary Computation. Advances in the Estimation of Distribution Algorithms. Series: Studies in Fuzziness and Soft Computing, volume 192 of Lecture Notes in Computer Science, pages 187–219. Springer-Verlag GmbH, 2004.

[RS94]

N.J. Radcliffe and P.D. Surry. Formal memetic algorithms. In Proceedings of the Evolutionary Computing , AISB Workshop, volume 865 of Lecture Notes in Computer Science, pages 1–16. Springer-Verlag GmbH, April 11-13 1994.

[RY98]

C.R. Reeves and T. Yamada. Genetic algorithms, path relinking, and flow shop. Evolutionary Computation, 6(1):45–60, 1998.

[SA03]

S.M. Soak and B.H. Ahn. New subtour-based crossover operator for the TSP. In Proceedings of the 5th Genetic and Evolutionary Computation Conference, GECCO 2003, volume 2724 of Lecture Notes in Computer Science, page 214, Chicago, IL, USA, 12-16 July 2003. SpringerVerlag GmbH.

[Sal96]

R. Salomon. The influence of different coding schemes on the computational complexity of genetic algorithms in function optimization. In Proceedings of the 4th International Conference on Parallel Problem Solving from Nature, PPSN IV, volume 1141, pages 227–235. SpringerVerlag GmbH, September 22-26 1996.

[SBB98]

L. Spector, H. Barnum, and J.H. Bernstein. Genetic programming for quantum computers. In J.R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D.B. Fogel, M.H. Garzon, D.E. Goldberg, H. Iba, and R. Riolo, editors, Proceedings of the 3rd Conference on Genetic Programming, pages 365–373, San Francisco, CA, 1998. Morgan Kaufmann.



200

[Sch74]

BIBLIOGRAPHY

H.P. Schwefel. Numerische Optimierung von Computer-Modellen. PhD thesis, Technischen Universität Berlin, 1974.

[Sch87]

J. Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Master’s thesis, Institut für Informatik, Technische Universität München, 1987.

[SG03]

A. Sinha and D.E. Goldberg. A survey of hybrid genetic and evolutionary algorithms. Technical Report 2003004, Illinois Genetic Algorithms Laboratory (IlliGAL), January 2003.

[SHBR97]

R. Schoonderwoerd, O. Holland, J. Bruten, and L. Rothkrantz. Ant-based load balancing in telecommunication networks. Adaptive Behaviour, 5(2):169–207, 1997.

[SHL+ 05]

P.N. Suganthan, N. Hansen, J.J. Liang, K. Deb, Y.P. Chen, A. Auger, and S. Tiwari. Problem definitions and evaluation criteria for the CEC 2005 special session on real-parameter optimization. Technical Report 2005005, School of EEE, Nanyang Technological University and Kanpur Genetic Algorithms Laboratory (KanGAL), May 2005.

[SK89]

R.D. Shachter and C.R. Kenley. Gaussian influence diagrams. Management Science, 35(5):527– 550, 1989.

[SKM00]

S.P. Singh, M.J. Kearns, and Y. Mansour. Nash convergence of gradient dynamics in generalsum games. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, UAI 2000, pages 541–548. Morgan Kaufmann, 2000.

[SLL+ 05]

X.H. Shi, Y.C. Liang, H.P. Lee, C. Lu, and L.M. Wang. An improved GA and a novel PSO-GAbased hybrid algorithms. Information Processing Letters, 93(5):255–261, March 2005.

[SM87]

J.D. Schaffer and A. Morishima. An adaptive crossover distribution mechanism for genetic algorithms. In J.J. Grefenstette, editor, Proceedings of the 2nd International Conference on Genetic Algorithms and their Applications, ICGA 1987, pages 36–40, Mahwah, NJ, USA, 1987. Lawrence Erlbaum Associates.

[SM02]

D. Seo and B.R. Moon. Voronoi quantizied crossover for traveling salesman problem. In Proceedings of the 4th Genetic and Evolutionary Computation Conference, GECCO 2002, pages 544–552. Morgan Kaufmann, 2002.

[SM06]

M.H. Shenassaa and M. Mahmoodi. A novel intelligent method for task scheduling in multiprocessor systems using genetic algorithms. Journal of the Franklin Institute, 343(4-5):361–371, July-August 2006.

[Smi80]

S.F. Smith. A Learning System Based on Genetic Adaptive Algorithms. PhD thesis, University of Pittsburgh, Pittsburgh, PA, USA, 1980.



BIBLIOGRAPHY

[SP91]

201

S. Syswerda and J. Palmucci. The application of genetic algorithms to resource scheduling. In Proceedings of the 4th International Conference on Genetic Algorithms, ICGA 1991, pages 502–508, 1991.

[SP95]

R. Storn and K. Price. Differential evolution - a simple and efficient adaptive scheme for global optimization over continuous spaces. Technical Report TR-95-012, International Computer Science Institute, 1995.

[SP05]

L. Shi and Y. Pan. An efficient search method for job shop scheduling problems. IEEE Transactions on Automation Science and Engineering, 2(1):73–77, January 2005.

[Spe95]

W.M. Spears.

Adapting crossover in evolutionary algorithms.

In J.R. McDonnell, R.G.

Reynolds, and D.B. Fogel, editors, Proceedings of the 4th Conference on Evolutionary Programming, pages 367–384, Cambridge, MA, 1995. MIT Press. [SSFS02]

H. Sanvicente-Sánchez and J. Frausto-Solís. Mpsa: A methodology to parallelize simulated annealing and its application to the traveling salesman problem. In Proceedings of the 2nd Mexican International Conference on Artificial Intelligence, MICAI 2002, volume 2313 of Lecture Notes in Computer Science, pages 145–158. Springer-Verlag GmbH, 2002.

[SXK+ 06]

S.E. Selvan, C.C. Xavier, N. Karssemeijer, J. Sequeira, R.A. Cherian, and B.Y. Dhala. Parameter estimation in stochastic mammogram model by heuristic optimization techniques. IEEE Transactions on Information Technology in Biomedicine, 10(4):685–695, October 2006.

[SY98]

U. Schwiegelshohn and R. Yahyapour. Analysis of First-Come-First-Serve parallel job scheduling. In Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, SODA 1998, pages 629–638. Society for Industrial and Applied Mathematics, 1998.

[SY00]

T. Schnier and X. Yao.

Using multiple representations in evolutionary algorithms.

In

Proceedings of the 2nd IEEE Congress on Evolutionary Computation, CEC 2000, volume 1, pages 479–486, La Jolla, CA, USA, July 2000. IEEE Press. [Sys91]

G. Syswerda. Schedule optimization using genetic algorithms. In L. Davis, editor, Handbook of Genetic Algorithms, pages 332–349. Van Nostrand Reinhold, 1991.

[Sys93]

G. Syswerda. Simulated crossover in genetic algorithms. In L.D. Whitley, editor, Proceedings of the 2nd Workshop on Foundations of Genetic Algorithms, FOGA 1993. Morgan Kaufmann, July 1993.

[Tal02]

E.-G. Talbi. A taxonomy of hybrid metaheuristics. Journal of Heuristics, 8(5):541–564, September 2002.



202

[Tes03]

BIBLIOGRAPHY

G. Tesauro. Extending Q-Learning to general adaptive multi-agent systems. In S. Thrun, L. K. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems, NIPS 2003, volume 16. MIT Press, December 8–13 2003.

[Thi05]

D. Thierens. An adaptive pursuit strategy for allocating operator probabilities. In Proceedings of the 7th Genetic and Evolutionary Computation Conference, GECCO 2005, pages 1539–1546, New York, NY, USA, 2005. ACM Press.

[TKS+ 94]

H. Tamaki, H. Kita, N. Shimizu, K. Maekawa, and Y. Nishikawa. A comparison study of genetic codings for the traveling salesman problem. In Proceedings of the 1st IEEE Conference on Evolutionary Computation, ICEC 1994 (IEEE World Congress on Computational Intelligence), volume 1, pages 1–6, June 1994.

[TL06]

L.Y. Tseng and S.C. Liang. A hybrid metaheuristic for the quadratic assignment problem. Computational Optimization and Applications, 34(1):85–113, May 2006.

[TMSI03]

E. Takashima, Y. Murata, N. Shibata, and M. Ito. Self adaptive island GA. In Proceedings of the 5th IEEE Congress on Evolutionary Computation, CEC 2003, volume 2, pages 1072–1079. Springer-Verlag GmbH, December 2003.

[TYS+ 07]

K. Tang, X. Yao, P.N. Suganthan, C. MacNish, Y.P. Chen, C.M. Chen, and Z. Yang. Benchmark functions for the CEC 2008 special session and competition on large scale global optimization. Technical report, Nature Inspired Computation and Applications Laboratory, USTC, China, 2007.

[Vob96]

S. Vob.

Dynamic tabu search strategies for the traveling purchaser problem.

Annals of

Operations Research, 63(2):253–275, April 1996. [VOV05]

K. Veeramachaneni, L.A. Osadciw, and P.K. Varshney. An adaptive multimodal biometric management algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 35(3):344–356, August 2005.

[VTCV06]

L. Vanneschi, M. Tomassini, P. Collard, and S. Vérel. Negative slope coefficient : A measure to characterize genetic programming fitness landscapes. In P. Collet et al., editor, Proceedings of the 9th European conference on Genetic Programming, EuroGP 2006, volume 3905 of Lectures Notes in Computer Science, pages 179–189. Springer-Verlag GmbH, April 2006.

[VW00]

M. Vázquez and L.D. Whitley. A hybrid genetic algorithm for the quadratic assignment problem. In L.D. Whitley, D.E. Goldberg, E. Cantú-Paz, L. Spector, I.C. Parmee, and H.G. Beyer, editors, Proceedings of the 2nd Genetic and Evolutionary Computation Conference, GECCO 2000, pages 135–142. Morgan Kaufmann, 2000.



BIBLIOGRAPHY

[Wal67]

203

M.J. Walsh. Evolution of finite automata for prediction. Final Report RADC-TR-67-555, Rome Air Development Center, Grifiss AFB, NY, 1967.

[Wal96]

M.B. Wall. A Genetic Algorithm for Resource-Constrained Scheduling. PhD thesis, Massachusetts Institute of Technology, June 1996.

[Wat89]

C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King’s College, Cambridge, UK, 1989.

[WBHW03]

J.P. Watson, J.C. Beck, A.E. Howe, and L.D. Whitley. Problem difficulty for tabu search in job shop scheduling. Articial Intelligence, 143(2):189–217, 2003.

[Wik09]

Wikipedia. Ant colony optimization. Accessed Aug 2009 [online]. Available: http://en. wikipedia.org/wiki/Ant_colony_optimization, 2009.

[WM97]

D.H. Wolpert and W.G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1):67–82, April 1997.

[WPS06]

J.M. Whitacre, T.Q. Pham, and R.A. Sarker. Credit assignment in adaptive evolutionary algorithms. In Proceedings of the 8th Genetic and Evolutionary Computation Conference, GECCO 2006, pages 1353–1360, Seattle, Washington, USA, 2006. ACM Press.

[WSS91]

D.L. Whitley, T. Starkweather, and D. Shaner. The traveling salesman and sequence scheduling: Quality solutions using genetic edge recombination. In L. Davis, editor, Handbook of Genetic Algorithms, pages 350–372. Van Nostrand Reinhold, New York, 1991.

[WYJ+ 04]

A.S. Wu, H. Yu, S. Jin, K. Lin, and G. Schiavone. An incremental genetic algorithm approach to multiprocessor scheduling. IEEE Transactions on Parallel and Distributed Systems, 15(9):824– 834, September 2004.

[WZZ06]

L. Wang, L. Zhang, and D.Z. Zheng. An effective hybrid genetic algorithm for flow shop scheduling with limited buffers. Computers & Operations Research, 33(10):2960–2971, 2006. Part Special Issue: Constraint Programming.

[Yan04]

W.X. Yang. An improved genetic algorithm adopting immigration operator. Intelligent Data Analysis, 8(4):385–401, September 2004.

[YJG03]

A.B. Yoo, M.A. Jette, and M. Grondona. SLURM: Simple linux utility for resource management. In Proceedings of the 9th International Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP 2003, volume 2682 of Lecture Notes in Computer Science, pages 44–60. Springer-Verlag GmbH, 2003.

[YL04]

K.C. Ying and C.J. Liao.

An ant colony system for permutation flow shop sequencing.

Computers & Operations Research, 31(5):791–801, April 2004. Antonio LaTorre de la Fuente


204

[YM02]

BIBLIOGRAPHY

H.S. Yoon and B.R. Moon. An empirical study on the synergy of multiple crossover operators. IEEE Transactions on Evolutionary Computation, 6(2):212–223, April 2002.

[YN95]

T. Yamada and R. Nakano. A genetic algorithm with multi-step crossover for job-shop scheduling problems. In Proceedings of the 1st International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications, pages 146–151. IEEE Press, September 1995.

[Zaf08]

M. Zaforas. Implementación de un algoritmo evolutivo basado en mos. Master’s thesis, Universidad Politécnica de Madrid, July 2008.

[ZDLY08]

F. Zhao, J. Dong, S. Li, and X. Yang. An improved genetic algorithm for the multiple traveling salesman problem. In Proceedings of the 2008 Chinese Control and Decision Conference, CCDC 2008, pages 1935–1939. IEEE Press, July 2008.

[Zin03]

M. Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pages 928–936, 2003.

[ZJ06]

C. Zhihua and Z. Jianchao.

Multi-parent dynamic nonlinear crossover operator for TSP.

International Journal of Computer Science and Network Security, 6(2):103–106, February 2006.



multiple offspring sampling (mos) - Core

multiple offspring sampling (mos) - Core

Suggest Documents

MOS-SSS - Core

Double sampling with multiple imputation to answer large ... - Core

Promising Infeasibility and Multiple Offspring ... - Semantic Scholar

Shark Virgin Birth Produces Multiple, Viable Offspring

MOS: Specialist MOS: Expert - Cognos

Multiple-Input Multiple-Output Sampling: Necessary ... - IEEE Xplore

Multiple independent sampling within medical school admission ...

Multiple Category-Lot Quality Assurance Sampling

Adaptive Sampling Algorithms for Multiple Autonomous Underwater ...

Sampling Strategies in Multiple-Image Radiography

Experiential Sampling on Multiple Data Streams - CiteSeerX

A 32*32-bit multiplier using multiple-valued MOS ...

efficient path sampling on multiple reaction channels

Modified systematic sampling with multiple random starts

Fast Evolution by Multiple Offspring Competition ... - Semantic Scholar

Effect of sampling effort and sampling frequency on the ... - Core

multiple intelligences theory - Core

MOS

MOS Capacitor

MOS CALIBRATION

MOS Fabrication

Increased blood pressure in the offspring of diabetic mothers - Core

MOS CORE MEASURES OF HEALTH-RELATED QUALITY OF LIFE

MOS Transistor