Q-Learning Based Hyper-Heuristic For Scheduling System Self-Parameterization Diamantino Falcão GECAD Research Group-School of Engineering Polytechnic Institute of Porto Porto, Portugal
[email protected]
Ana Madureira GECAD Research Group-School of Engineering Polytechnic Institute of Porto Porto, Portugal
[email protected]
Abstract — Optimization in current decision support systems has a highly interdisciplinary nature related with the need to integrate different techniques and paradigms for solving realworld complex problems. Computing optimal solutions in many of these problems are unmanageable. Heuristic search methods are known to obtain good results in an acceptable time interval. However, parameters need to be adjusted to allow good results. In this sense, learning strategies can enhance the performance of a system, providing it with the ability to learn, for instance, the most suitable optimization technique for solving a particular class of problems, or the most suitable parameterization of a given algorithm on a given scenario. Hyper-heuristics arise in this context as efficient methodologies for selecting or generating (meta) heuristics to solve NP-hard optimization problems. This paper presents the specification of a hyper-heuristic for selecting techniques inspired in nature, for solving the problem of scheduling in manufacturing systems, based on previous experience. The proposed hyper-heuristic module uses a reinforcement learning algorithm, which enables the system with the ability to autonomously select the meta-heuristic to use in optimization process as well as the respective parameters. A computational study was carried out to evaluate the influence of the hyper-heuristics on the performance of a scheduling system. The obtained results allow to conclude about the effectiveness of the proposed approach. Keywords-Hyper-heuristics, Machine Learning, Q-Learning, Optimization, Scheduling, Meta-heuristics, Multi-Agent Systems
I.
INTRODUCTION
Every day, people and organizations are challenged with difficulties when solving problems such as finding the best route between different locations, find the best possible time, or make resource management, amongst others. For many of these combinatorial optimization problems, the number of solutions grows exponentially with the size of the problem and finding the best solution(s) becomes difficult. In fact, combinatorial optimization problems arise from the need to select from a discrete and finite set of data, the best subset satisfying certain criteria of economic and operational nature [1]. Scheduling problems are subject to restrictions, with a dynamic and very complex resolution nature, being classified as NP-hard, whose basic elements are the machines and tasks. The scheduling aims to affect in time tasks to particular machines, with certain restrictions, e.g. no machine can process more than one task simultaneously [2]. In this context, different
Ivo Pereira GECAD Research Group-School of Engineering Polytechnic Institute of Porto Porto, Portugal
[email protected]
heuristics are presented as suitable tools to solve combinatorial optimization problems, which include the scheduling problem. In fact, there are many research methods based on heuristics. Some of the most used methods results of ideas adaptation from several areas, some based in nature, are called metaheuristics. The choice of a meta-heuristic for solving a given problem may be viewed as a real optimization problem. The optimization in current applications assumes a highly interdisciplinary nature relating to the need to integrate different techniques and paradigms in solving complex real problems, and computing the optimal solutions in many of these problems are untreatable. In this context, hyper-heuristics are methods for selecting or generating (meta) heuristics to solve particularly difficult optimization problems. Machine Learning (ML) arises as an addition to hyperheuristics, offering a variety of techniques in order to introduce intelligent behavior in the choice and selection of optimization techniques, with the aim of solving optimization problems. AutoDynAgents (Autonomic Agents with Self-Managing Capabilities for Dynamic Scheduling Support in a Cooperative Manufacturing System) [3, 4] is a Multi-Agent System for autonomous resolution, distributed and cooperative scheduling problems subject to disturbance. This system incorporates concepts of Autonomic Computing and uses meta-heuristics for finding near optimal scheduling plans. Since the AutoDynAgents environment is complex, dynamic and unpredictable, learning issues have become indispensable. In this context, it is proposed a hyper-heuristic module based on a reinforcement learning algorithm (Q-Learning), for selecting meta-heuristics, to solve the scheduling problem. The remaining sections are organized as follows. Section II describes the scheduling problem and presents some approaches for its resolution, including meta-heuristics. Section III presents a literature review of machine learning techniques. In section IV the hyper-heuristic concept is presented along with some classification methodologies. Section V describes the AutoDynAgents system and the developed hyper-heuristic, as well as how the developed module integrated on AutoDynAgents system. The computational study is presented in Section VI and, finally, the paper presents some conclusions and puts forward some ideas for future work.
II.
THE SCHEDULING PROBLEM
The scheduling problem consists basically in carrying out a set of tasks in a limited set of resources in order to determine their use, and thus meet economic and operational objectives [2]. According to Brucker [5], the scheduling problem is composed of three elements: a set of machines, the specifics characteristics of the job, and the optimization criteria. The machine set is the type of production system that will run the scheduling plan. Task characteristics represent the number of operations, the precedence between operations, and interruptible. Finally, the optimization criterion, where the goal is maximizing or minimizing an objective function such as minimization of the total completion time of the tasks ("makespan"), or the minimization of the sum of the weighted tardiness. The three mentioned elements specify the variety and complexity of each scheduling problem. Methods for solving scheduling problems whose resolution time increases with the size of the problems can be divided into two categories [2]: exact and approximation methods. Exact methods perform an exhaustive search of the solution space, thus ensuring the optimal solution. However, due to this fact it should only be considered to solve small problems. These methods require a long time to generate an optimal solution, and do not always produce a viable solution in time. On the other hand, approximation methods allows to find satisfactory solutions in a satisfactory time period. Based on Artificial Intelligence, these methods are easy to implement and produce solutions in a reasonable time. However, do not guarantee the optimal solution. Inside approximation methods, it is possible to highlight the meta-heuristics, which will be described in greater detail in section 4. III.
MACHINE LEARNING
Artificial Intelligence (AI) is an area that seeks to develop computational methods by simulating human intelligence. One of the basic requirements of any intelligent behavior is learning. As such, intelligence cannot exist without a learning process. In this context, machine learning is assumed as an area of AI research dedicated to the development of algorithms and techniques to endow computers with the ability to learn. Mitchell [6] defined ML: "How to build computer systems that automatically learn from experience and what the fundamental laws that govern the whole process of learning?" Machine learning studies the fundamental laws that govern the learning process. The theories and algorithms developed for this area of knowledge are of great importance for the understanding of aspects of human learning. Understand the process of human learning continues to be one of AI goals. Machine learning algorithms are organized according to a particular taxonomy, depending on the desired result. Generally, machine learning techniques are classified into three types [7]: supervised, unsupervised, and reinforcement. Supervised Learning aims to define a function, from a set of training data, which can perform a prediction on new data. These training data consists of input-output pairs of objects. The function can generate a continuous value (regression) or can predict the class of the new object (classification). The
goal is to predict the function value for each input object after analyzing a set of training examples. However, this type of learning has some limitations. The first relates to the difficulty in data classification. When there is a large amount of input data is extremely costly, if not impossible, to sort all the data. The second, with the fact that not everything in the real world is likely to be classified, there are uncertainties and ambiguities. These difficulties can limit learning systems in some scenarios. Thus, a learning system, after analyzing a number of examples, usually small, must provide a classifier that works well with any example. Unsupervised Learning seeks to determine how the data is organized. According to Ghahramani [8], this type of algorithms are developed to extract data structures. The objective is not to maximize a utility function, but simply find similarities in the training data. In fact, Clustering is one of the techniques of unsupervised learning, where the goal is to group objects that have a high degree of similarity. Clustering techniques proved to be particularly useful in finding meaningful distributions or classes of data. Data clustering consists in dividing a set of data into groups, to place the different objects in groups with the same similarity. As such, the data analysis is not an automatic task, but an iterative process of knowledge discovery that involves a trial and error process, where it is often necessary to modify the parameters and the preprocessing until the desired result is achieved. Reinforcement Learning aims to analyze the decisions of an apprentice in a particular environment in order to maximize the notion of cumulative reward. Sutton and Barto [9] defined Reinforcement Learning as the learning process that allows mapping between situations and actions to maximize a numerical reward signal. In Reinforcement Learning is not indicated to the learner the type of action that it should accomplish, but it must discover, by experimentation, what action will give a greater reward. This type of learning differs from Supervised Learning by the fact that it never as correct pairs of input-output data, neither sub-optimal actions explicitly correct. Reinforcement Learning is particularly useful in areas where the reinforcing information (expressed as penalties or rewards) is provided after being carried out sequences of actions on the environment [10]. Thus, the learning process occurs through interaction with the environment. At each iteration a learner observes the current state 𝑠. In each state the learner can perform an action from the set of available actions. An action can cause a transition from one state 𝑠 to a state 𝑠0 , based on a transition probability. A numeric value of reward 𝑟 is returned to the learner to report on the "quality" of their actions. There are two different types of Reinforcement Learning methods in particular iteration policy and iteration value. A learner looking directly for the best policy in a space of possible policies is implementing an iteration policy method. On the other hand, iteration value methods do not search directly for the ideal policy; they learn evaluation functions to
the states or state-action pairs. The evaluation function can be used to assess the quality of a state-action pair. IV.
Feedback
Nature of the heuristic search space Heuristics Selection
HYPER-HEURISTICS AND META-HEURISTICS
Despite the success of the meta-heuristics in solving problems that require massive computational effort, which includes the scheduling problem, its application in solving new instances presents some difficulties. These difficulties arise due to the number of parameters and how the algorithms are selected. Furthermore, the performance of different metaheuristics can vary significantly depending on specific features of the problem in question. In this context, the aim is to develop algorithms that are more generally applicable. Thus, using hyper-heuristics, it is possible to find suitable metaheuristics to solve a given problem, rather than trying to solve the problem directly. A hyper-heuristic can be seen as a methodology (high level) which, when applied effectively seeks to resolve a specific problem or class of problems. A. Hyper-Heuristics The Hyper-heuristic term was introduced in 1997 to describe a protocol that combines several AI methods. The term was used independently in 2000 to describe "heuristics that choose heuristics" [11] in the context of combinatorial optimization. In this context, a hyper-heuristic is a high level approach, where given a particular instance of a problem and a number of low level heuristics, choose and apply the most appropriate heuristic at every decision moment. However, the idea of automating the modeling process of heuristics is not new, since it dates back to the early 60 [12, 13]. The latest research in this topic automates the generation of new heuristics, more appropriated for a given problem or class of problems. This is typically accomplished by combining the components of heuristics [14]. The literature shows a wide range of hyper-heuristics approaches that use high level methodologies, along with a set of low level heuristics, applied to different optimization problems. In this context, Chakhlevitch and Cowling [15] present three criteria for defining these approaches: one hyperheuristic is (i) a high level heuristic that generates a set of low level heuristics, (ii) looking for a good method to solve a problem, rather than a good solution, and (iii) uses only information specific to the problem in analysis. The authors consider the latter criterion as the most relevant. In Burke et al. [16], the hyper-heuristic is defined as "a search method or learning mechanism used in the selection or generation of heuristics to solve computational problems of research", where a generic classification is proposed: (i) the nature of the hyper-heuristic search space, (ii) and the source of feedback during the learning process. This classification is shown in Figure 1. In this classification, the different spaces of heuristic search can be combined with different sources of feedback, and different techniques of machine learning [14]. As such, in this work we adopted the described selection methodology.
Online Learning
Offline Learning
No Learning
Methodologies to select Hyperheuristics
Heuristics Generation
Methodologies to generate
Construction heuristics
Perturbation heuristics
Construction heuristics
Perturbation heuristics
Figure 1: A classification of hyper-heuristic approaches [14].
B. Meta-heuristics Meta-heuristics are algorithms that proven to be very effective in solving various combinatorial optimization problems. However, its application is generally limited to certain areas of problems and these methods are usually of expensive development. Meta-heuristic approaches, with a good performance in a given real world problem, may not work or even produce bad solutions to other problems or instances of the same class. Such limitations may become problematic in situations where the problem of data and business requirements often change over time. Thus, Meta-heuristics appear as iterative methods able to obtain solutions the closest as possible to the global optimum to a given problem. In this context, it is described some metaheuristics, giving emphasis to those used in this work. Genetic Algorithms are the most extensive group of representative methods of application of Evolutionary Computation tools. These work with a population of individuals representing solutions to the problem, using recombination, mutation, selection and substitution [2]. Artificial Bee Colony (ABC) is an optimization algorithm based on the behavior of bee colonies, proposed by [17]. The algorithm is inspired by the actual behavior of bees. Ant Colony Optimization (ACO) is based on the real behavior of ants, where it allows finding the shortest path between a food source and the respective colony [18, 19]. The ACO algorithm is based on parametric probabilistic models using pheromones for defining a path. Particle Swarm Optimization is a method based on populations proposed and developed by James Kennedy and Russell Eberhart [20], which seeks to simulate a social system in a simplified manner. This method seeks to demonstrate the behavior that flocks of birds take on their local random trajectories, but globally determined [19]. Tabu Search, developed by Glover [21], is a local search based method that aim to escape the local minima. It uses memory structures that describe the visited solutions. Thus, if a possible solution previously visited, it is marked as "taboo", so that the algorithm does not consider the solution repeatedly.
Simulated Annealing is an algorithm originating from the 80 proposed by Kirkpatrick et al. [22] and Cerny [23]. This algorithm is based on a statistical basis, relies on movements caused by bad quality solutions to a current solution in order to escape the local minimum. V.
AUTODYNAGENTS SYSTEM AND THE HYPER-HEURISTIC MODULE
AutoDynAgents system allows the dynamic resolution of scheduling problems with autonomous capabilities in which a number of agents modeling a real manufacturing system subject to disturbance. The system is able to find (near) optimal solutions through the use of meta-heuristics, dealing with the dynamism (arrival of new tasks, canceled jobs, change the attributes of the tasks, etc.), change/adapt the parameters of the algorithm according to the current situation, switch between meta-heuristics, and perform coordination between the agents, through cooperation and negotiation [24]. AutoDynAgents system aims to provide the system with the ability to self-parameterization of meta-heuristic, according to the problem to be solved. The system should be able to choose a meta-heuristic to be used and set the parameters according to the current problem. In addition, it may be possible to exchange a method to another, depending on the problem and experience. This optimization is performed through learning and experience. Reinforcement learning methods can be used at different levels to solve combinatorial optimization problems. They can be applied directly to the problem, as part of a meta-heuristic or as part of a hyper-heuristics. Thus, in this work was developed a hyper-heuristic that incorporates a method of reinforcement learning, the 𝑄-Learning algorithm and incorporated into the AutoDynAgents system. The hyper-heuristic is able to set the parameters of the meta-heuristics, as different problems arise. A. Q-Learning Algorithm The 𝑄-Learning algorithm aims to learn the value of stateaction pair, 𝑄(𝑠, 𝑎) that represents the expected reward for each pair state action, represented by 𝑠 and 𝑎 , respectively. Thus, for the system the pair of great value state action is the optimal policy that the learner intends to learn. In this context, Table I describes the functional structure of the algorithm. TABLE I.
FUNCTIONAL STRUCTURE OF THE Q-LEARNING ALGORITHM Algorithm 1: Q-Learning Algorithm
1. Initialize the values 𝑄(𝑠, 𝑎) randomly or to zero 2. Select the initial state (𝑆0) randomly 3. Using an 𝜀 − 𝑔𝑟𝑒𝑒𝑑𝑦 policy to select the action (𝑎) suitable to the state (𝑆0 ) 4. Preform the action (𝑎) selected, receive the reward (𝑟) and choose the next state 𝑆1 5. Update the value of 𝑄(𝑠, 𝑎) as follows: 𝑄(𝑠_0, 𝑎) = 𝑄(𝑠_0, 𝑎) + 𝛼 [𝑟 + 𝛾 𝑚𝑎𝑥 𝑄(𝑠_1, 𝑏) − 𝑄(𝑠_0, 𝑎)] 6. Update the state, 𝑆0 = 𝑆1 7. Repeat step 3 until 𝑆0 represents a terminal state 8. Repeat steps 2 through 7 for a number of times
Each iteration represents a learning cycle (steps 2 to 7). Parameter 𝛼 represents the influence of the learning rate as the parameter 𝛾 is the influence of the value of future rewards.
The matrix that relates the pair state-action and stores the result of the function-𝑄 is initialized with the same value for all pairs state-action, which is 0 (zero). Step 3 represents the tradeoff between the phase of exploration and exploitation. B. State Evaluation Criteria The decisions taken by a learner based on the current state of the system. The state of the system is the foundation that allows a learner to select from a set of actions the most appropriate. Several options are available to set the system state: these include measures such as the number of tasks (jobs) in the buffer, to minimize the total completion time of tasks, or average journey time. Thus, minimizing the total completion time of tasks was the criterion used to assess a particular state. In order to evaluate the performance of the algorithm, applied to a scheduling problem, this is only used when there are two states for evaluation. Where the state reflects the result of applying a meta-heuristic to a scheduling problem. Thus, if there is only one state, the algorithm does not update the value 𝑄(𝑠, 𝑎) , since the decision on the next action is performed without considering the 𝑄-Learning algorithm. C. Exploration and Exploitation The exploration and exploitation are key concepts when applying a reinforcement learning algorithm, such as 𝑄 Learning. As the exploration means that the learner search for something which was not done before, to obtain a greater reward. Furthermore, the exploitation means that the learner prefers actions taken previously and rewarded. Accordingly, the exploitation can guarantee getting a good reward, however, long-term exploration can provide more opportunities to maximize the total reward. A common approach to deal with the question of tradeoff is called 𝜀 − 𝑔𝑟𝑒𝑒𝑑𝑦 method. The method selects randomly at each iteration, an action with a fixed probability, 0 ≤ 𝜀 ≤ 1, instead of selecting in a greedy way one of the actions learned by the function-𝑄. This process is shown in equation (1). 𝜏(𝑠) = {
random action of 𝜆(𝑠), argmaxα∈λ(s) 𝑄(𝑠, 𝑎),
𝑖𝑓 𝜅 < 𝜀 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(1)
where 0 ≤ 𝑘 ≤ 1 is a random number set for each new iteration. D. Reward Function The reward defines the goal of the learner and determines the value of immediate action based on the system state. Once the learner seeks to maximize the overall reward, the reward function is mainly used to guide the learner, so that it reaches its goal. In this sense, the goal of the system is the minimization of the total completion time of the tasks (𝐶𝑚𝑎𝑥 ). Thus, when generating a new scheduling plan, the obtained 𝐶𝑚𝑎𝑥 is compared with the current 𝐶𝑚𝑎𝑥 . If the new 𝐶𝑚𝑎𝑥 is greater than the 𝐶𝑚𝑎𝑥 of the current state, the learner receives a reward -1. Otherwise, the learner receives a reward of 1. In other words, when an action results in a 𝐶𝑚𝑎𝑥 greater than the previous action, it will receive a penalty in the pair table state action.
In this sense, the reward process can be described as follows: reward of 1, if the scheduling plan improves, that is, the 𝐶𝑚𝑎𝑥 decrease; reward of -1, if the scheduling plan get worse, that is, the 𝐶𝑚𝑎𝑥 increase. This type of reward can stimulate the learning process. VI.
Analyzing the boxplot (Erro! A origem da referência não foi encontrada.) it is possible to conclude about the advantage of the 30 runs stopping criterion, which indicates that the results are better when executed the 30 runs. TABLE II.
COMPUTATIONAL STUDY
In this section, the computational study performed in order to validate the performance of the hyper-heuristic is described. The computational results were obtained using the academic instances (Job-Shop) of Lawrence [25], Adams et al. [26], Applegate and Cook [27] and Storer et al. [28]. Initially, we present the results obtained by the hyperheuristic and then the results serving as the basis for comparison [29]. All results shown correspond to minimize the completion time (𝐶𝑚𝑎𝑥 ). A. Results Obtained with the Hyper-heuristic The results presented refer to the results obtained by the AutoDynAgents system with the hyper-heuristic. Thus, for the instances under consideration the stopping criterion for the 𝑄learning algorithm varied between 10, 20 and 30 runs. For each run the data were collected and the mean of the values obtained. In addition, the values were normalized by calculating the ratio between the optimum value and the average value of 𝐶𝑚𝑎𝑥 (equation (2)) in order to be possible to estimate the difference between the value obtained and the value of the optimal solution referenced in the literature. OptCmax (2) Cmax Therefore, to analyze the results, we use the Student t test [30]. The samples were normalized to be possible to compare the approaches directly in a comprehensive manner. This normalization was performed by calculating the ratio of the average values (Erro! A origem da referência não foi encontrada.). q=1−
Figure 2: Quotient comparison of the mean values of system execution with the Hyper-heuristic.
10 vs. 20 Runs 10 vs. 30 Runs 20 vs. 30 Runs
RESULT OF THE TEST T STUDENT PAIRED SAMPLES
Mean difference
Std deviation
t
DoF
p_value
-0,00062
0,01891
-0,181
29
0,858
0,07068
0,03963
9,768
29
0,000
0,07131
0,04091
9,547
29
0,000
Analyzing the statistical significance of these results (Table II), by observing the values 𝑡(29) = −0,181; 𝑝 > 0,05 , it appears that there were no statistically significant differences between the results obtained in the 20 runs against the 10. On the other hand, comparing the groups between 30 and 10 runs, and observing the values 𝑡(29) = 9,768; 𝑝 < 0,05, we can say with a degree of confidence of 95%, that there are significant statistical differences between the two results, advantageously for 30 runs. Finally, comparing the results between 30 and 20 races, and observing the values 𝑡(29) = 9,547; 𝑝 < 0,05, one can say with a degree of confidence of 95%, there are statistically significant differences between the two results, advantageously for 30 runs. It is possible to conclude that for a 30 runs stopping criterion was possible to achieve better overall results and as such, these results will be used to validate the performance of hyper-heuristics, given the results obtained without learning mechanism. B. Results Obtained Without Learning Mechanism The result were obtained before the implementation of the hyper-heuristic, which serve for comparison with the results obtained by the hyper-heuristic incorporated in the AutoDynAgents system. The results were obtained after five runs for each instance under analysis.
existence of statistical evidence on the inclusion of learning on the process of meta-heuristics selection. For future work we expect an extensive study of the proposed module and the implementation of other learning techniques, to compare with this proposal. ACKNOWLEDGMENT This work is supported by FEDER Funds through the “Programa Operacional Factores de Competitividade COMPETE” program and by National Funds through FCT “Fundação para a Ciência e a Tecnologia” under the projects: PEst-OE/EEI/UI0760/2014. REFERENCES [1] I. Pereira, Sistema Inteligente para Escalonamento Assistido por Aprendizagem. 2014, UTAD, Vila Real.
[2] A.M. Madureira, Aplicação de Meta-Heurísticas ao Problema de Figure 3: Quotient comparison of the mean values of system execution, between previous results and the results obtained by the Hyper-heuristic.
Thus, comparing the results obtained it is possible to see by analyzing the boxplot presented in Figure 3 that the hyperheuristics (𝑄-Learning) for the 30 runs stopping criterion had the best performance in minimizing the 𝐶𝑚𝑎𝑥 . The 𝑡 Student test was used to analyze the statistical significance of the results, by observing the values 𝑡(29) = −9,152; 𝑝 < 0,05 in Table III, it can be stated, with a confidence level of 95%, there are statistically significant differences for the previous results and the results obtained by the hyper-heuristics (30 runs), allowing to conclude on the advantage the hyper-heuristics. TABLE III.
Hyperheurístic vs. Without learning
RESULT OF THE TEST T STUDENT PAIRED SAMPLES Mean difference
Std deviation
t
DoF
p_value
-0,11529
0,06900
-9,152
29
0,000
The Hyper-heuristic presented advantages over previous results, obtaining the best average results, leading to the conclusion as to the existence of statistically significant advantage of the approach based on hyper-heuristics in the performance of AutoDynAgents system. VII. CONCLUSIONS AND FUTURE WORK This paper envisaged the application of a learning module for the resolution of Scheduling problems. Considering the importance of this issue in the process of using meta-heuristics, the proposed approach is based on Q-Learning, with the objective to automate the selection of those techniques. The computational study aimed to evaluate the performance with the incorporation of Q-Learning in a Multi-Agent Scheduling System. The results of the proposed approach were compared with the results obtained previously, without the incorporation of learning mechanisms. All results were validated by analysis of statistical significance. It was verified significant statistically advantage in the use of Q-learning module. From the results, it is possible to conclude about the
[3] [4] [5] [6] [7] [8] [9] [10] [11]
[12]
[13] [14] [15] [16] [17] [18] [19] [20] [21]
Escalonamento em Ambiente Dinâmico de Produção Discreta 2003, Universidade do Minho, Braga, Portugal. A. Madureira and I. Pereira, Self-Optimization for Dynamic Scheduling in Manufacturing Systems. 2010, pp. 421–426. A. Madureira, et al., Scheduling a Cutting and Treatment Stainless Steel Sheet Line with Self-Management Capabilities. 2011, vol. 46, pp. 34-47. P. Brucker, Scheduling Algorithms. 2001: Springer, 371. T.M. Mitchell, The Discipline of Machine Learning. Technical report CMU-ML-06-108, Carnegie Mellon University, 2006, pp. 1-7. E. Alonso, et al., Learning in Muli-Agent Systems. The Knowledge Engineering Review, Cambridge University Press, 2001, pp. 277-284. Z. Ghahramani, Unsupervised learning algorithms are designed to extract structure from data. IOS Press, 2008, vol. 176, pp. 1-8. R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. 1998, MIT Press, pp. 322. L. Panait and S. Luke, Cooperative Multi-Agent Learning: The State of the Art. Autonomous Agents and Multi-Agent Systems, 2005, vol. 11, pp. 387-434. P. Cowling, G. Kendall, and E. Soubeiga, Hyperheuristic Approach to Scheduling a Sales Summit. In Selected papers of Proceedings of the Third International Conference of Practice and Theory of Automated Timetabling (PATAT2000), 2000, vol. 2079, pp. 176-190. H. Fisher and G.L. Thompson, Probabilistic learning combinations of local job-shop scheduling rules. In: Factory Scheduling Conference, Carnegie Institue of Technology. Carnegie Institute of Technology, 1961. H. Fisher and G.L. Thompson, Probabilistic learning combina-tions of local job-shop scheduling rules. In: Industrial Scheduling. Prentice Hall, Englewood Cliffs, 1963, pp. 225–251. E.K. Burke, et al., Hyper-heuristics: A Survey of the State of the Art. Journal of the Operational Research Societ, 2013, pp. 1695-1724. K. Chakhlevitch and P. Cowling, Hyperheuristics: Recent developments. In: Cotta C. Sevaux M. Sorensen K (eds) Adaptive and Multilevel Metaheuristics, 2008, vol. 136, pp. 3–29. E.K. Burke, et al., Handbook of meta-heuristics, international series in operations research & management science. 2009, vol. 146, pp. 449–468. D. Karaboga, An Idea Based On Honey Bee Swarm for Numerical Optimization. Technical Report-TR06, 2005. M. Dorigo and T. Stützle, Ant Colony Optimization. 2004. A. Madureira, Técnicas Emergentes de Optimização no Suporte à Tomada de Decisão. Documento apresentado no concurso de provas públicas para Professor Coordenador. ISEP/IPP, 2009. J. Kennedy and R. Eberhart, Particle Swarm Optimization. Proceedings of IEEE International Conference on Neural Networks, 1995. F. Glover, Future Paths for Integer Programming and Links to Artificial Intelligence. Computers & Operations Research, 1986, pp. 533–549.
[22] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, Optimization by Simulated Annealing. Science, 1983, vol. 220, pp. 671–680. [23] V. Cerny, Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, 1985, vol. 45, pp. 41-51. [24] A. Madureira, et al., Negotiation Mechanism for Self-Organized Scheduling System with Collective Intelligence. Neurocomputing, Elsevier, 2014, vol. 132, pp. 97–110. [25] S. Lawrence, Resource constrained project scheduling: an experimental investigation of heuristic scheduling techniques. 1984.
[26] J. Adams, E. Balas, and D. Zawack, The shifting bottleneck procedure for job shop scheduling. Management Science, 1988, pp. 391–401.
[27] D. Applegate and W. Cook, A computational study of the job-shop scheduling instance. 1991, pp. 149–156.
[28] R.H. Storer, S.D. Wu, and R. Vaccari, New search spaces for sequencing instances with application to job shop scheduling. Management Science, 1992, vol. 38, pp. 1495–1509. [29] D. Falcão, Hiper-heurísticas com Aprendizagem. 2014, Instituto Superior de Engenharia do Porto, Portugal. [30] J.F. Box, Guinness, Gosset, Fisher, and Small Samples. Statistical Science, 1987, vol. 2, pp. 45-5.