Using Case-Based Reasoning as a Reinforcement Learning ... - DTIC

0 downloads 0 Views 839KB Size Report
TRIMESTRAL. Using Case-Based Reasoning as a Reinforcement Learning framework for Optimization with Changing Criteria. TIITTIMO Who. Dajun Zeng batia ...
Carnegie Mellon University The Robotics InstituteROBOTICSINSTITUTE

TeehmcaL Report

Using Case-Based Reasoning as a Reinforcement Learning framework for Optimization with Changing Criteria

Dajun Zeng

DTK QUAUTI mcPECTEB 5 Katia Sycara

CMU-RI-TR-95-13

The Robotics Institute Carnegie Mellon University Pittsburgh, Pennsylvania 15213 March, 1995

Accesion For NTIS CRA&I DTIC TAB Unannounced Justification By Distribution/ Availability Codes Dist

©1995 Carnegie Mellon University

Avail and/or Special

m

fThis research was partially supported by the Defense Advance Research Projects Agency under contract #F30602-91-C-0016.

"ra^RroimcirsfÄfEM^T "A Approved for public release; Distribution Unlimited

Contents 1

Introduction

1

2 Repair-based Optimization and Reinforcement Learning 2.1 Job-Shop Scheduling

3 5

3

6

Overview of CABINS

4 Experimental Evaluation of Capturing Changing Preferences 4.1 Experimental Design 5

Conclusions

7 8 11

List of Figures 1

A repair-based Problem Solving Session

4

List of Tables 1 2 3

Notations for Different Objectives 9 Experimental Results: Quality Improvement when Preferences Change 10 Experimental Results for Problems with 5 resources and 20 jobs 11

Abstract Practical optimization problems such as job-shop scheduling often involve optimization criteria that change over time. Repair-based frameworks have been identified as flexible computational paradigms for difficult combinatorial optimization problems. Since the control problem of repair-based optimization is severe, Reinforcement Learning (RL) techniques can be potentially helpful. However, some of the fundamental assumptions made by traditional RL algorithms are not valid for repair-based optimization. Case-Based Reasoning (CBR) compensates for some of the limitations of traditional RL approaches. In this paper, we present a Case-Based Reasoning RL approach, implemented in the CABINS system, for repair-based optimization. We chose job-shop scheduling as the testbed for our approach . Our experimental results show that CABlNS is able to effectively solve problems with changing optimization criteria which are not known to the system and only exist implicitly in a extensional manner in the case base.

11

1

Introduction

Consider an AI program (an agent) that must learn to solve real-world problems, assuming that no complete domain knowledge is available. For each problem it's trying to solve, it needs to collect information about the world (either from its sensors or from interaction with its user) and must choose an action to take. After executing the chosen action, the agent receives a signal (a reinforcement signal) from the world that indicates how well the agent is performing. The agent evaluates this reinforcement signal and decides either to go to another loop of sense-select-evaluate, or to terminate the problem solving process. This learning scenario is quite different from standard concept learning, in which a teacher presents the learner with a set of input/output pairs. In the reinforcement learning (RL) scenario, the learner is not told anything about which action to take, but instead must discover which action yields the highest reward by trying different actions. Typically, actions may affect not only the immediate reward, but also the next situation, and through that all subsequent rewards [13]. In this paper, we present a learning agent that solves one of the "hardest" [3] combinatorial optimization problems, i.e., job-shop scheduling problems. Our approach, implemented in the CAB[NS system, is shown experimentally to be able to learn scheduling problem solving knowledge even when the scheduling criteria change over time. This capability is very important for the following reasons. First, traditional search methods, both Operations Research-based and Al-based, that are used in combinatorial optimization, need explicit representation of the optimization objectives, that must be defined in advance of problem solving [11]. In many practical problems, such as scheduling and design, optimization criteria often involve context- and user-dependent tradeoffs which are impossible to represent as an explicit and static optimization function. Second, and equally important consideration is the fact that the problem solving environment and optimization criteria could be changing over time. Therefore, approaches that capture optimization criteria statically or require expensive knowledge-base updating are extremely limiting. On the other hand, approaches that utilize machine learning techniques to adapt their behavior to the changing objective criteria and problem solving context are much more promising. Recently, repair-based optimization has been identified as a very flexible framework for solving optimization problems [6]. Reinforcement learning (RL) is particularly relevant and potentially useful within a repair-based

framework. Some basic assumptions that typical reinforcement learning methods [14, 13] make about the problem domain, however, are violated for solving complex optimization tasks. (1) The reinforcement signal is typically assumed to be a scalar, which doesn't hold for real-world optimization tasks, where evaluation criteria are situation-dependent and changing. (2) RL methods assume that there is an explicit criterion to tell problem solver when the goal has been reaches. However, for optimization tasks, except for toy problems, it is not possible to verify the optimality of a certain solution short of using exhaustive search, which is computationally prohibitive. To address these fundamental issues, instead of using classic reinforcement learning techniques, such as Q-learning [19], or connectionist-based approaches [5], we apply Case-Based Reasoning (CBR) [4] as the primary tool to (1) represent the state space implicitly and approximately in a case-base, (2) generate expected rewards associated with sample points in the state space based on previous problem solving experiences and knowledge about optimization criteria, (in some sense, an approximation of Q used in Q-learning is estimated through CBR), (3) choose the appropriate action at each decision-making point to maximize the expected reward, and (4) utilize failure information as a helpful index to explore temporal credit assignment information. Our experimental results show that CBR could be effectively incorporated within a RL context. Due to the approximate nature of CBR, when CBR-based selection and evaluation are applied in decision-making, we lose many nice properties which Temporal Differences-based approach [14] can provide, such as asymptotic convergence. We believe, however, that our CBR-based approach has good potential for (1) handling much bigger search spaces since it doesn't require an explicit representation of problem space, and (2) attacking task domains with complicated and dynamically changing decision-making criteria and constraints. The work reported here extends previous work on the CÄBlNS system [17, 15, 20, 16, 9]. It tests the hypothesis that our CBR-based incremental repair methodology shows good potential within a reinforcement learning context to solve problems with optimization criteria that change over time. Our investigation was conducted in the domain of job shop schedule optimization and the experimental results, shown in section 4 confirmed this hypothesis.

2

Repair-based Optimization and Reinforcement Learning

A general optimization task can be described as follows: ma,xf(xi,X2,...,xn) subject to: where

Cj(xi,x2,.. .,x„) > 0, j = 1,2,.. .,m /(.): objective function X{, i = 1,2,..., n: decision variables Cj{.): constraints over the decision variables.

Two categories of problem solving strategies are commonly used to calculate the optimal solution (x"i, x2,..., xn) which maximizes /(.). One of them is the constructive approach, which tries to find the optimal solution from scratch. At each problem solving step, only partial solutions are generated and/or assembled. The problem solving stops once a complete solution is attained, which is presumably optimal or satisficing. The other approach, called repair-based or revision-based, doesn't solve the optimization problem directly from scratch, but instead first finds an easy-to-compute, complete, and most likely suboptimal solution that is to be incrementally repaired to meet optimization objectives. The advantages of repair-based approach for optimization problems for which there is no known efficient constructive algorithm has been recently realized by both Operations Research and AI communities [6, 10]. Within a repair-based optimization framework, the search space consists of all the possible solutions. The components of a repair-based approach are (1) transform operators used for generating a new complete solution given an old one, and (2) control knowledge for choosing the right transform operator so that a sequence of state transitions will lead to a global optimum. Typically, a certain transform operator focuses on one particular aspect of the problem and tries to improve it. Therefore transform operators are inherently of local nature. Figure 1 shows a typical problem solving session using the repair-based perspective. Different search paradigms have been proposed to efficiently explore the search space, such as hill-climbing and some variation of hillclimbing including Simulated Annealing and Tabu search aimed at avoiding local minima. Hill-climbing-like searches might be very useful in situations where (1) no other available domain knowledge can be exploited except for the knowledge of objective criteria and transform operators, or (2) we want

TrOptl

TrOpÜ t ;—;———v—

Suggest Documents