Bounded Explanation and Inductive Re nement for ... - CiteSeerX

8 downloads 396 Views 171KB Size Report
Intersection of preconditions: A new rule is cre- ated by intersecting the state meta-predicates,. i.e., true-in-state, of equivalent rules for the same target concept.
Bounded Explanation and Inductive Re nement for Acquiring Control Knowledge Daniel Borrajo and Manuela Velosoy School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213-3891 fborrajo,[email protected]

Abstract One approach to learning control knowledge from a problem solving trace consists of generating explanations for the local decisions made during the search process. These explanations become control rules that are used in future situations to prune the search space. Strong deductive approaches invest a substantial explanation e ort to produce correct control rules from a single problem solving trace. Alternatively, inductive approaches acquire incrementally correct learned knowledge by observing a large set of problem solving examples. In this paper, we advocate a learning method where a deductive and an inductive strategies are combined to eciently learn control knowledge. The approach consists of bounding the explanation to a predetermined set of problem solving features. Since there is no proof that the set is sucient to capture the correct and complete explanation for the decisions, the control rules acquired are re ned, if and when applied incorrectly to new examples. The method is especially signi cant as it applies directly to nonlinear problem solving, as developed lately in the prodigy architecture, where we learn control rules for individual decisions

On leave from the Universidad Politecnica de Madrid (Spain). This work was sponsored by a grant of the Ministerio de Educacion y Ciencia and the project CO4491 from Comunidad de Madrid. y This research was sponsored by the Avionics Laboratory, Wright Research and Development Center, Aeronautical Systems Division (AFSC), U. S. Air Force, Wright-Patterson AFB, OH 45433-6543 under Contract F33615-90-C-1465, Arpa Order No. 7597. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the ocial policies, either expressed or implied, of the U.S. Government. 

corresponding to new learning opportunities o ered by the nonlinear problem solver beyond the linear one. We show results of our on-going implementation in the latest version of prodigy's nonlinear problem solver, i.e., in prodigy4.0.

1 Introduction One of the main problems when learning control and operative knowledge for problem solvers has been to select a set of features that can capture the sucient information for rejecting, selecting, or preferring a certain decision in the search tree for solving eciently any problem of the domain. Two approaches adopted have been to learn an explanation and prove that it is correct, or not prove that it is correct and re ne it upon experiencing on other problems. The rst approach, as in [DeJong and Mooney, 1986, Mitchell et al., 1986, Minton, 1988], usually involves a substantial e ort to prove the correctness of the learned knowledge. In addition it requires a complete domain theory to obtain the explanations 1, although there have some work on learning with incomplete, or intractable theories, such as [Tadepalli, 1989]. Moreover, this approach does not work when dealing with a nonlinear planner [Wang et al., 1993]. The alternative approach, induction, usually requires a large set of examples and a long time to learn a correct description of the right control knowledge. Furthermore the method strongly depends on the particular examples seen. This paper presents a method that combines a deductive and an inductive approach. First, it learns a control description analytically, by loosely following the subgoaling information, although no proof 1 In addition to the set of operators and inference rules that describe the problem solving primitive actions, a complete domain theory includes a set of domain axioms that enables the proof of the universal truth of the learned knowledge in the domain.

of its correctness is given, and then it re nes this description according to the experience. We have called this method bounded learning, because we only use a xed set of features to interpret the problem solving experience, in order not to spend too much e ort proving its correctness, reducing, therefore, the need for a knowledge-intensive generation of correct explanations. On the other hand, since the learned rules are approximately correct, there should be no need to use a large number of examples for re ning them, as usual inductive methods do. The paper shows the results of its implementation in prodigy4.0 which we called hamlet, standing for Heuristics Acquisition Method by Learning from Explanation Trees. We can draw an analogy between our proposed method for generating bounded explanations and the general problem of diagnosing. When diagnosing any kind of system (human or machine), the diagnoser follows rst a xed set of tests, in order to identify or explain the cause of the failure. This xed set of tests is usually shared by most diagnosers in the same eld, and usually was obtained by experiencing with many possible tests and arriving to a set of tests (over some values of features) that are likely to lead to the solution. There is not guarantee of reaching a complete explanation from this set of tests, and usually these may be followed by additional tests depending on the individual cases. So, this is also the solution we propose here: generate an explanation for a certain xed set of features that have obtained good results in the majority of the problems it has learned from. The paper is divided in the following sections. Section 2 introduces the bounded-deductive learning of control knowledge. Section 3 presents the inductive re nement strategy. Section 4 reports on empirical results. Finally, section 5 concludes the paper.

2 Learning Bounded Explanations The goal of our investigation is to learn e ectively control knowledge from a problem solving experience. This work is developed within the nonlinear problem solver [Veloso, 1989, Carbonell et al., 1992] of the prodigy architecture [Carbonell et al., 1990].

2.1 The Substrate Problem Solver The nonlinear problem solver in prodigy follows a means-ends analysis backward chaining search procedure reasoning about multiple goals and multiple alternative operators relevant to the goals. Figure 1 sketches the problem solver's algorithm. The inputs to the procedure are the set of operators specifying the task knowledge and a problem speci ed as an initial state and a goal statement.

1. Check if the goal statement is true in the current state, or there is a reason to suspend the current search path. (a) If yes, then either return the nal plan or backtrack. 2. Compute the set of pending goals G , and the set of possible applicable operators A. 3. Choose a goal G from G or select an operator A from A that is directly applicable. 4. If G has been chosen, then  expand goal G, i.e., get the set O of relevant instantiated operators for the goal G,  choose an operator O from O ,  go to step 1. 5. If an operator A has been selected as directly applicable, then  apply A,  go to step 1.

Figure 1: A skeleton of prodigy's nonlinear problem solving algorithm The planning reasoning cycle involves several decision points, namely: the goal to select from the set of pending goals and subgoals; the operator to choose to achieve a particular goal; the bindings to choose in order to instantiate the chosen operator; apply an operator whose preconditions are satis ed or continue subgoaling on a still unachieved goal. Dynamic goal selection from the set of pending goals enables the planner to interleave plans, exploiting common subgoals and addressing issues of resource contention. Search control knowledge may be applied at all the above decision points: which relevant operator to apply (if there are several), which goal or subgoal to address next, whether to reduce a new subgoal or to apply a previously selected operator whose preconditions are satis ed, or what objects in the state to use as bindings of the typed variables in the operators. Decisions at all these choices are taken based on user-given or learned control knowledge to guide the casual commitment search. Control knowledge can select, reject, prefer, or decide on the choice of alternatives [Veloso, 1989]. This knowledge guides the search process and helps to reduce the exponential explosion in the size of the search space. Previous work in the linear planner of prodigy uses explanation-based learning techniques [Minton, 1988] to extract from a problem solving trace the explanation chain responsible for a success or failure and compile search control rules therefrom. Similar e orts within the linear planner of prodigy were done to learn control rules from partially evaluating the domain theory [Etzioni, 1990, Perez and Etzioni, 1992].

In the nonlinear planner, [Veloso, 1992] develops a case-based learning method that consists of storing individual problems solved to guide the planner when solving similar new problems. The guiding similar plans provide global control knowledge in the sense that they consists of a chain of decisions. This control guidance contrasts with local control rules that apply independently to individual decision steps. It is beyond the scope of this paper to compare the distinctions between global and local control knowledge which we believe have complementary bene ts to the problem solver. The paper presents instead our work in learning local control rules for the nonlinear problem solver of prodigy. We have identi ed [Wang et al., 1993] several challenges in extending directly the previous explanation-based algorithms developed for the linear planner to the nonlinear one. Instead, the work we report in this paper applies directly to nonlinear problem solving which trivially encompasses linear problem solving. In our nonlinear problem solving framework, hamlet learns control rules for individual decisions compiling the conditions under which the rules are to be transferred to local decision steps in other problems. Alternative learning approaches in nonlinear planning include learning complete generalized plans as in [Kambhampati and Kedar, 1991].

2.2 Labeling the Search Tree and Credit Assignment When solving a problem, the problem solver generates a search tree. The domain theory implicitly de nes a subgoaling structure that relates goals with the operators that can achieve those goals. In a linear planner, the search tree reproduces exactly this structure, since interleaving of goals and subgoals at di erent search spaces is not allowed. However, in nonlinear problem solving, there is a variety of di erent interleaved ways to traverse the subgoaling structure which are captured in the search tree. The input to hamlet is the search tree. The output is the set of control rules that will prune the search space. hamlet's learning method can be divided into three phases: the labeling of the search tree, the assignment of credit to the tree decisions, and the generation of the control rules. hamlet traverses the search tree top-down to label rst the leaf nodes. It assigns three kinds of labels to the leaf nodes: success, if it was a solution path; failure, if it was a failed path; and unknown, if the planner did not expand the node. After labeling the leaf nodes, it backs up the values up to the root of the search tree. Figure 2 summarizes this labeling strategy. The credit assignment is done at the same time as the labeling, and it consists of identifying

the decisions for which learning will occur. procedure LABEL (node eagerp) for all successors of node do LABEL (successor eagerp) case of null(successors): case of solution-path: label node as success. failed-path: label node as failure. untried: label node as unknown. there is at least one unknown successor: if eagerp AND there are success children then if optimal-learning-p then store the "best" successor else LEARN the "best" successor label node as success else label node as unknown. there are only success and failure: if optimal-learning-p then store the "best" successor else LEARN the "best" successor label node as success. there are only failure label node as failure. there are only success label node as success.

Figure 2: A skeleton of the labeling and credit assignment algorithm The parameter eagerp controls the situations from which control rules are generated. If eagerp is true, hamlet will learn a select rule whenever a node has a success child. If eagerp is false, hamlet follows a conservative learning mode. A rule is then learned only if all of its children are labeled success or failure and there is at least one child labeled failure. These two di erent modes correspond to di erent levels of learning eagerness. The parameter optimal-learning-p allows to learn only from the best solution found, where we can incorporate di erent quality criteria. If optimal-learning-p is true, hamlet delays learning until it traverses the complete tree and nds the best solution. In that case, after labeling, it descends through the best solution path, learning from every decision according to the selected level of eagerness. This algorithm builds upon several previous work on learning and problem solving, including [Mitchell et al., 1983, Langley, 1983]. We extend these pioneering methods in several dimensions. In particular, our techniques apply to a nonlinear problem solver which has a large search space where many factors in uence the individual decisions. Furthermore hamlet follows a labeling procedure where alternative failure decisions are considered positive learning instances of di erent target concepts instead of negative examples of the decision. Negative examples of decisions

will be provided by eventual misuse of the acquired control rules as we discuss below.

2.3 Generation of Control Rules hamlet proceeds to generate the control rule by acquiring its corresponding pre- and postconditions. The preconditions of the control rule need to establish the relevant conditions under which the decision was made and also de ne the situations under which the rule can be reapplied. The appropriate set of features that we consider in our bounded explanation technique as well as the credit assignment procedure have evolved from extensive previous work of the rst author [Borrajo et al., 1992b]. Although there is no guarantee that this set of features is a sucient set, there have been a number of iterations in the design of the set, to generate our con dence on it. Furthermore the empirical experiments con rm that the set is appropriate and the induction (re nement method) increases its application eciency. hamlet learns four kinds of control rules: select operators, select goals, select bindings, and decide subgoal. hamlet generates a disjunction of conjunctive rules for each target concept. The target concepts are each one of the possible decisions to be made attached to some of the preconditions to make them. For instance, a target concept might be select operator unstack to achieve the goal clear. The number of target concepts P is (3 + O) Oi=1 pO , where O is the total number of operator schemas in the domain, and pO is the number of postconditions of the operator Oi . (In the blocksworld with four operators, this number becomes 126.) Each kind of control rule has a set of xed features for describing its preconditions, plus a set of common features for all kinds of control rules. Examples of common features, which become meta-predicates of the control language, are:  True-in-state : tests whether the assertion is true in the state.  Prior-goal : tests whether the goal is the rst goal of its conceptual path. Similarly, examples of the other features are:  Current-goal : tests whether the goal is the one that the planner is trying to achieve.  Candidate-applicable-op : tests whether the operator is applicable in the current state. The postconditions of the control rules are the decisions to be made, such as (select operator unstack), or (select goal (on )) (Variables are represented in brackets.) i

i

3 Applying induction over the learned rules The rules generated by the bounded explanation method are over-speci c in that they a low transfer to other problems. Therefore, we follow the deductive method with a re nement algorithm that inductively modi es the rules. This hill climbing procedure, which we are still currently working on, converges to increasingly more adequate sets of rules. We have devised ways of inducing over the following aspects of the learned knowledge:  State: Most of the rules are over-speci c because they keep many things extra from the state.  Subgoaling structure: By relaxing the subgoaling links, for example as captured by the priorgoal meta-predicate, since the same goal can be generated as a subgoal of many di erent goals.  Interacting goals: Identifying the correct subset of the set of pending goals that a ect a particular decision extending the learning scope also to quality decisions.  Type hierarchy: The generalization level to which the variables in the control rules belong considering the ontological type hierarchy that is available in the nonlinear version of prodigy.  Operator types: Further learning from an operator hierarchy to enlarge the scope of the generalization procedure. hamlet currently considers the following inductive operators: 1. Parametrization: Consists of the substitution of equal instances for equal variables considering the classes to which the instances belong. A constraint is imposed that all variables have di erent values at run time. 2. Preserve main preconditions: hamlet is able to remove \unimportant" preconditions that are found not to a ect the validity of the control rule. It keeps the main preconditions, i.e., the preconditions that have variables directly related to the learned decision. 3. Delete rules that subsume others: A rule subsumes another rule if there is a substitution that makes its preconditions a superset of the other. 4. Intersection of preconditions: A new rule is created by intersecting the state meta-predicates, i.e., true-in-state, of equivalent rules for the same target concept. This induction operator therefore merges the disjunction of two or

more conjunctive rules into one conjunctive rule when appropriate. 5. Re nement of subgoaling dependencies: If there are two rules sharing preconditions and they have two di erent prior goals for the same current goal, they are merged into a metapredicate called any-prior-goal, which checks whether any of them is a prior goal of the current goal. 6. Re nement of the set of interacting goals: The set of pending goals is relaxed to the subset of interacting goals. This may cause problems when the purpose of the control rule is precisely to select a goal among others that are not independent from that one. 7. Find common superclass: When two rules can be uni ed by two variables that belong to classes that are subclasses of a common class (except for the root), this operator generalizes the variables to the common superclass. We implemented previously a variation of this technique applied to a the parametrization procedure of a single rule [Borrajo et al., 1992a]. This set of inductive operators may produce overgeneral rules in special situations, as we have been experiencing in our more sophisticated recent tests. These situations are bene cial for our inductive learning style as they provide negative examples of the application of the learned rules. We are looking into applying and extending existing methods for relational-based induction such as [Quinlan, 1990]. The hill climbing performance of our global learning algorithm will approach the ultimately correct control knowledge by converging gradually closer from both over-speci c and over-general rule sets. Our learning algorithm reasons about and converges from points in the generalization space as it is prohibitively costly to maintain both the speci c and general sets as in the versions space method.

3.1 Example of a Learning Episode We show a simple illustrative example of the learning method applied to a variation of the well-known Sussman problem [Sussman, 1975] as described in Figure 3. We introduce an extra block, D, in the problem to illustrate the e ect of the inductive techniques pruning the set of features from the state. Besides other rules related to operator and binding choices, hamlet learns the correct goal interleaving that enables the nonlinear planner to generate the optimal solution for this problem. Figure 4 shows the rule that achieves this behavior as it was learned by the bounded explanation phase after the appropriate parametrization. After having been exposed to other variations of

Initial state: ((on-table A) (on-table B) (on C A) (on-table D) (clear B) (clear C) (clear D) (arm-empty)) Goal: (and (on A B) (on B C))

Figure 3: The initial state and the goal statement for a variation of the problem known as the Sussman's anomaly (control-rule decide-sub-goal-marar-34629 (if (and (applicable-ops (pick-up )) (current-operator put-down) (current-goal (arm-empty)) (prior-goal (on )) (true-in-state (arm-empty)) (true-in-state (clear )) (true-in-state (clear )) (true-in-state (clear )) (true-in-state (clear )) (true-in-state (on-table )) (true-in-state (on-table )) (true-in-state (on-table )) (true-in-state (on-table )) (other-goals ((on ))) (type-of-object object) (type-of-object object) (type-of-object object) (type-of-object object))) (then subgoal (on )))

Figure 4: Control rule learned for goal interleaving in a variation of the Sussman anomaly with an extra irrelevant block (D). the same problem involving both other relevant top level goals and states, hamlet's inductive procedure reaches the rule as shown in Figure 5. This rule transfers successfully to any variation of the Sussman's anomaly as well as to other related problems. The rule allows the nonlinear problem solver to use the correct goal interleaving leading to the optimal solution.

4 Empirical Results We have been performing extended empirical experiments in the blocksworld and in the logistics transportation domain [Veloso, 1992]. We report here the results from initial experiments in the blocksworld that we fully analyzed and interpreted. The results illustrate our main claims about the e ectiveness of the combined bounded explanation and inductive methods in speedup learning. First, we used a set of 50 problems randomly generated from which hamlet learns the control rules.

Figure 5: The re ned rule of Figure 4 after the inductive steps.

Number of nodes

It learned 90 control rules. After applying the inductive operators, this set was reduced to 31 rules. These rules were applied to a second set of 50 randomly generated problems. Figure 6 shows the cumulative number of nodes searched from this test set without rules and with the induced rules. As it can be seen, the learned rules largely outperform the situation without rules. 7000 6000 5000 4000 3000 2000 1000 0

0

5 10 15 20 25 30 35 40 45

Problems

Figure 6: Comparison between the number of nodes searched with prodigy's base-level search algorithm and with the bound-explanation method without induction Figure 7 shows equivalent results for the cumulative running time for the two situations. Since the matcher for the control rules is not using any optimum retrieving algorithm, the time spent matching the rules represents the usual utility problem. The results shown in Figure 7 are especially relevant as the use of the induced set of rules outperformed the base-level problem solver even with the rudimentary matcher.

Time (seconds)

(control-rule decide-sub-goal-marar-34629 (if (and (applicable-ops (pick-up )) (current-goal (arm-empty)) (prior-goal (on )) (true-in-state (clear )) (true-in-state (clear )) (true-in-state (on-table )) (true-in-state (on-table )) (other-goals ((on ))) (type-of-object object) (type-of-object object) (type-of-object object))) (then subgoal (on )))

160 140 120 100 80 60 40 20 0

0

5 10 15 20 25 30 35 40 45

Problems

Figure 7: Comparison between the time spent by prodigy's search algorithm without control rules, with the learned control rules, and with the induced rules

5 Conclusions The approach we have presented addresses the problem shared by many speedup learning methods of ef ciently acquiring control knowledge for improving the problem solver performance. We proposed a solution where we combine an inexpensive deductive explanation method with an inductive technique. In this case, the tradeo between the accuracy of the control strategy and the resources used to obtain it is solved by bounding the learning step with a xed set of tests that have been manually adapted from the experience of the authors. Since there is not a proof that the approach generates a right control strategy, the method allows to re ne the result of the learning method, in order to get closer to the right strategy. Therefore, we combine analytical-based learning (using the search tree), and induction-based learning (re ning and generalizing the control strategies).

Acknowledgements We greatly appreciate the help from Juan Pedro Caraca-Valente in previous steps of the research, discussions with Jaime Carbonell, and the comments of the whole prodigy group.

References

[Borrajo et al., 1992a] Daniel Borrajo, Juan P. Caraca-Valente, and Jose Luis Morant. Learning heuristics in planning. In Sixth International Conference on Systems Research, Informatics and Cybernetics, Baden-Baden, Germany, 1992.

[Borrajo et al., 1992b] Daniel Borrajo, Juan P. Caraca-Valente, and Juan Pazos. A knowledge compilation model for learning heuristics. In Pro-

ceedings of the Workshop on Knowledge Compilation of the 9th International Conference on Machine Learning, Scotland, 1992. [Carbonell et al., 1990] Jaime G. Carbonell,

Craig A. Knoblock, and Steven Minton. Prodigy: An integrated architecture for planning and learning. In K. VanLehn, editor, Architectures for Intelligence. Erlbaum, Hillsdale, NJ, 1990. Also Technical Report CMU-CS-89-189. [Carbonell et al., 1992] Jaime G. Carbonell, and the PRODIGY Research Group. PRODIGY4.0: The manual and tutorial. Technical Report CMUCS-92-150, School of Computer Science, Carnegie Mellon University, June 1992. [DeJong and Mooney, 1986] Gerald F. DeJong and Raymond Mooney. Explanation-based learning: An alternative view. Machine Learning, 1(2):145{ 176, 1986. [Etzioni, 1990] Oren Etzioni. A Structural Theory of Explanation-Based Learning. PhD thesis, School of Computer Science, Carnegie Mellon University, 1990. Available as technical report CMU-CS-90185. [Kambhampati and Kedar, 1991] Subbarao Kambhampati and Smadar Kedar. Explanation based generalization of partially ordered plans. In Proceedings of AAAI-91, pages 679{685, 1991. [Langley, 1983] Pat Langley. Learning e ective search heuristics. In Proceedings of IJCAI-83, pages 419{421, 1983. [Minton, 1988] Steven Minton. Learning E ective Search Control Knowledge: An ExplanationBased Approach. PhD thesis, Computer Science

Department, Carnegie Mellon University, 1988. Available as technical report CMU-CS-88-133. [Mitchell et al., 1983] Tom M. Mitchell, Paul E. Utgo , and R. B. Banerji. Learning by experimentation: Acquiring and re ning problem-solving heuristics. In R. S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning, An Arti cial Intelligence Approach, pages 163{ 190. Tioga Press, Palo Alto, CA, 1983. [Mitchell et al., 1986] Tom M. Mitchell, Richard M. Keller, and Smadar T. Kedar-Cabelli. Explanation-based generalization: A unifying view. Machine Learning, 1:47{80, 1986. [Perez and Etzioni, 1992] M. Alicia Perez and Oren Etzioni. DYNAMIC: A new role for training problems in EBL. In D. Sleeman and P. Edwards, editors, Proceedings of the Ninth International Conference on Machine Learning. Morgan Kaufmann, San Mateo, CA, 1992.

[Quinlan, 1990] J. Ross Quinlan. Learning logical de nitions from relations. Machine Learning, 5:239{266, 1990. [Sussman, 1975] Gerald J. Sussman. A Computer Model of Skill Acquisition. American Elsevier, New York, 1975. Also available as technical report AI-TR-297, Arti cial Intelligence Laboratory, MIT, 1975. [Tadepalli, 1989] Prasad Tadepalli. Lazy explanation-based learning: A solution to the intractable theory problem. In Proceedings of the Eleventh International Joint Conference on Arti cial Intelligence, pages 694{700, San Mateo,

CA, 1989. Morgan Kaufmann. [Veloso, 1989] Manuela M. Veloso. Nonlinear problem solving using intelligent casual-commitment. Technical Report CMU-CS-89-210, School of Computer Science, Carnegie Mellon University, 1989. [Veloso, 1992] Manuela M. Veloso. Learning by Analogical Reasoning in General Problem Solving. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, August 1992. Available as technical report CMU-CS-92-174. [Wang et al., 1993] Xuemei Wang, Manuela Veloso, Alicia Perez, Rujith deSilva, Daniel Borrajo, and Jim Blythe. Explanation based learning for nonlinear problem solving in PRODIGY. Technical report, School of Computer Science, Carnegie Mellon University, 1993. forthcoming.

Suggest Documents