Genetically Evolved Macro-Actions in AI Planning Problems Abstract 1 ...

5 downloads 8145 Views 1MB Size Report
how need knowledge about the domains and the search algorithms as Macro-FF uses ..... names should be considered as copies of one macro. Long sequence ...
Genetically Evolved Macro-Actions in AI Planning Problems Muhammad Abdul Hakim Newton, John Levine, Maria Fox Department of Computer and Information Science, University of Strathclyde Livingstone Tower, 26 Richmond Street, Glasgow G1 1XH, United Kingdom e-mail: {firstname.lastname}@cis.strath.ac.uk

Abstract

1.1

Despite recent progress in planning, many complex domains and even simple domains with large problems remain hard and challenging for current planners. A macro-action, defined as a group of actions applied at one time, can make jumps to reach a goal at less depth in the search tree and thus problems, not solvable within a given time limit, might become solvable. FF Style planners like Macro-FF and MARVIN showed some improvement with macro-actions. But both of them somehow need knowledge about the domains and the search algorithms as Macro-FF uses static facts and MARVIN uses plateau escaping sequences to generate macroactions. There is no known method capable of learning good macros without any significant structural knowledge about the domains or the planning algorithms. Genetic algorithms are automatic learning methods that require just a method to seed the initial population, definitions of the genetic operators on the populations, and a method to evaluate individuals across the populations but no structural knowledge about the problem domains and the search algorithms. Genetic algorithms have promising results in learning control knowledge for a domain and some success in generating plans but have not yet been tried to evolve macro-actions. This paper presents initial results of applying a genetic algorithm to learn macro-actions in planning problems.

1

Planning and its Progress

Planning, an important and fundamental research problem in Artificial Intelligence (AI), can be described informally as finding a plan (i.e. a sequence of actions) to take a given world from a specified initial state to a desired goal state. One might often be interested in an optimal plan generally optimizing a given objective function. Usually the cost factors to minimize are time, battery power, fuel etc., and the achievements to maximize include data gathered or work done or anything that is interesting to the user. There are varieties of problems that require planning research. These include control programs for rovers in planetary exploration, artificial ants, robot navigation systems, block-stacking systems, disaster management, space exploration, military planning, etc. AI planning has recently achieved much progress which is evident from the results of international planning competitions held in years 1998, 2000, 2002, 2004 [13, 3, 11, 9]. FF (Fast Forward) [10] and LPG (Local search for Planning Graphs) [7] are two state-of-theart planners that performed best in several categories in 2002 and 2004 competitions respectively. In successive competitions, planners performed much better than earlier ones in solving harder problems in more complex and realistic domains. However, finding any plans, let alone optimal ones, in many complex domains having large number of actions and/or goals at great depths still remains very hard and challenging for the current planners. Even with simple domains, they still encounter difficulties as problems get larger than a certain limit. Therefore, it is very important to achieve speed up to solve more and more large and difficult problems which are currently unsolvable within a given time limit.

Introduction

This section discusses planning and its progress, macroactions in planning, and our contribution in this paper. 1

1.2

Macro-Actions in Planning

A macro-action (or macro in short) is simply a group of actions applied altogether like a single action. A macro can represent a frequently used group of actions or a high level task comprising low level details. One example in the real world might be making a cup of tea which always involves turning the kettle on, heating water and pouring into a tea-cup, mixing milk, sugar, and tea and lastly stirring with a spoon. In this paper we consider a macro as a super action comprising a sequence of actions applied at one time. A macro has its preconditions and effects like normal actions, and is added in a planning problem as an additional action. It therefore does not affect the solvability of the problem. Macros making jumps can reach a goal at less depth in the search tree and thus problems intractable within a practical time limit might become tractable. Macros can also help increase the visibility of the state-space to the planner especially when the goodness of the immediate next states cannot be measured appropriately. Again, macros might help represent high level actions for high level planners. However, macros will incur extra overhead for the planners with increased branching factor when added into the domain as additional actions. Also, they can result in inferior longer plans as in case of a repeated use of a macro it might be possible to execute some actions only once. Therefore, macros need to be carefully assessed to check whether the gains outweigh the costs and only the statistically potentially good macros should be added to a problem domain. The important question at this point is whether macros can help the state-of-the-art planners like FF or LPG, to solve problems faster and/or to solve more hard problems. MARVIN in [6] shows that caching the plateaux escaping action sequences and using them later when the same type of plateaux are encountered helps FF achieve some improvement in several domains. MARVIN also lifts macros from plans of reduced problems found by elimination symmetries . Macro-FF [4] shows that operators in a reformulated domain, after component level abstraction using static facts, achieve significant improvement when used as macros with FF. In addition Macro-FF also lifts macros comprising only two actions from plans and later evaluates them for selection. However, no such study with LPG has so far been reported. It would therefore be interesting to see the effects of macros on LPG (and also other planners) since its search algorithm is different. In [8], macros are lifted from action sequences required

to achieve heuristically identified subgoals and about 44% improvement is achieved in the Rubik’s Cube domain. In [2], another automatic method has been presented that achieves a slightly negative impact on performance by discovering abstraction hierarchies and exploiting domain invariants to learn macros. Since relatively little work on macros in planning has been found, it is too early to make any final comment; but it is clear that macros can help planners like FF in many cases. Therefore, macros in planning need further investigation and deserve a lot more attention.

1.3

Our Contribution: Evolved Macro-Actions

Genetically

Macro learning systems found in the literature in some way exploit inherent properties from the domains and/or hidden weaknesses of the planners to generate macros. Some of them later evaluate the macros for further selection. However, one should not expect many inherent common properties across domains, or similarities in the weaknesses of different planners. Therefore, it is very important to find a generalized macro learning method that works for any domain and any planner without using any specific structural knowledge about them. Genetic algorithms being general purpose search methods have significant success in different areas. They are automatic evolutionary learning methods that require almost no specific structural knowledge about the problem domains or the search algorithms but just a method of seeding the population, definitions of the genetic operators on the populations and a method to evaluate individuals across the populations. Genetic algorithms have produced promising results in learning control knowledge for domains and some success in generating plans and plan optimization. In [15], Spector managed to achieve a maximally fit plan for the Sussman Anomaly, and for a range of initial and goal states, but the problems were very small. SINERGY presented in [14] could only solve problems with specific initial and goal states. Later GenPlan in [17] showed that genetic algorithms can generate plans but when compared with the state-of-the-art planners it looks somewhat inferior. Genetic algorithms has also been used to optimize plans in [18]. EvoCK [1] uses heuristics generated by HAMLET [5] to seed the initial population of control rules and then genetically evolves better ones for PRODIGY4.0 [16]. Overall EvoCK outperformed both

PRODIGY4.0 and HAMLET. Later, L2Plan in [12] used genetic approach to evolve control knowledge or policies and showed promising results by outperforming hand-coded policies. However genetic algorithms have not yet been tried to evolve better macros in planning domains. Our main contribution in this paper is an investigation into applying a genetic algorithm to learn macros in planning domains. We represent macros by two components – a composition, which is a sequence of action headers, and an action that is built up by appropriate regression of the variable substituted actions in the composition. For a given planner and a given domain, we lift random macros of random length from plans of smaller problems to form the initial population. For evolution, we use genetic operators – crossover and mutation, on the composition components of randomly selected macros to get resultant compositions from which the complete child macros are built up. To compare individual macros across the populations, we solve a different set of problems with the original domain, and then with the macro augmented domains, and finally use a utility function to rank them. To show the performance of the finally selected individual macro, we solve another set of problems over a range of difficulty levels. We tested our approach on about 10 STRIPS domains with the FF planner and achieved very promising results in several of them. The rest of the paper is organized as follows: Section 2 describes one genetic approach and issues involved with adaptation of a genetic algorithm to macros in planing. Section 3 presents our experimental results. Later, Section 4 discusses our future works and lastly, Section 5 presents the conclusion.

2

Genetic Algorithms and Planning Macros

This section presents one genetic algorithm and its adaptation to learn macros in planning. It also discusses possible pruning strategies that might help reduce the loss of effort in exploring potentially bad macros.

2.1

Genetic Algorithms

Genetic algorithms might vary slightly in their implementations. Figure 1 and 2 presents one such algorithm we have used in this work.

Genetic Algorithm 1. Initialize the population and then evaluate each individual to assign a numerical rating 2. Repeat the following steps for a given number of generations but exit if in he last generation necessary new individuals were not found or replacement of inferior parents by superior children was not satisfactory (a) repeat the following steps for a maximum number of times but exit if the new generation is fully populated i. Randomly select a genetic operator and one or two parent individual(s) ii. Applying the selected genetic operator on the selected parents, generate one or two child individual(s) and add into the new generation if they are not occurred before. (b) Evaluate each individual in the new population to assign a numerical rating (c) Replace inferior parents by superior children 3. Present the best individual(s) in the population as the output of the algorithm

Figure 1: A genetic algorithm Individual 1

Individual 2

Parents

G H I

A B C D E F

J K

Crossover A B J K Individual 3

G H I Offspring

C D E F Individual 4

Parents A B C D Mutations

Addition A B C K D

A B C D Deletion A B D

A B C D Alteration A Q C D

Offspring

Figure 2: Illustration of different genetic operators

2.2

Genetic Algorithms on Macros

The genetic approach in general has three requirements – a method to seed the initial population, an appropriate evaluation method to rank individuals, and definitions of the genetic operators. While adapting the algorithm in Figure 1, we found that the evaluation method is the most time consuming part. 2.2.1

Definition of Genetic Operators

This paper assumes that a macro is a sequence of actions or at least a serial form of any parallel group of actions or a linearization of a partial order sub-plan. Once we get a linear order of actions in a macro, we build the resultant action by composing the constituent actions. Therefore, genetic operators are applied on the

action-sequences comprising the macros (see Figure 2) and from the resultant sequences we build the complete macros using composition (see Figure 3). We only use the crossover and the add/delete type mutations but not the reproduction or the alter type mutation. Only the child individual with the shorter sequence of actions is accepted as the result of a crossover operator and add/deletes are done at either end, rather than at a random position, just because we do not want to disrupt a good sequence if we can avoid it.

either way and also it might be unsatisfiable. We can detect both of these up to a certain level but not completely. For unsatisfiability, the more critical one, we can try another time consuming way of solving problems (probably in the evaluation phase) with the macro augmented domain to see whether the planner can use it in plans or not.

Effect under Composition: The resultant effect will be a union of the modified effect of the latter action and a part of the effect of the former action not further modified by the effect of the later. A sub-effect Action 1 Action 2 Precondition Effect Effect Precondition after modification might be null when it reverses some both way modification modification other sub-effect, or is satisfied by the resultant preconPrecondition Effect Macro-Action dition, and might be inconsistent if it duplicates some Regression Examples other sub-effect. We can detect both of these up to a Precond1 Effect1 Precond2 Effect2 M. Precond M. Effect certain level. But, we cannot detect if any inconsisnot P add P P del P true null P P false del P tency occurs inside the planner while instantiating the not P false add P not P actions. For example, a positive and negative literal innot P true null P del P add P invalid del P del P volving the same predicate, but with different parameadd P add P invalid ters, may result in an inconsistency if, after grounding, X = X - 2 X≥1,X ≥ 3 X≥ 1 X=X-1 X≥2 X=X-3 they become contradictory. However we can handle this problem using not-equalities appropriately in the preFigure 3: A composition of actions by regression condition wherever two objects, constants or variables, have compatible types and we do not want them to be the same. But this solution again poses a problem for Composition of Actions: The composition of actions untyped domains where all objects are of same type. will use the regression method. Composition by regres- We need to add a long list of not-equalities to handle sion on actions is a binary, non-commutative and asso- that. Therefore this paper does not consider untyped ciative operation, where the precondition and the effect domains although this difficulty can be abated to some of the later action are subject to the effect of the former extent using types described by static unary facts. action (see Figure 3) and the parameters are unioned together. This paper considers composition of actions Parameters under Composition: The parameters of only in the STRIPS and FLUENTS subsets of PDDL as the resultant action will be the union type collection of it is practically not feasible for other complex language the parameters of both the actions after all the common constructs. However, it is quite understandable at this parameters are unified. Parameter unification poses stage that not every sequence of actions, if composed many difficulties as how to do a unification - by type or together, will produce a sensible macro. For example, by name. The first option requires us to discover the carthe precondition of the resultant action might be unsat- dinality of relationships and/or interactions in the doisfiable or the effect might be inconsistent because of main and the second one, which we adopted, is difficult conflicting modifications, or the parameter unification for uninstantiated actions. So we lift initial macros from might face name or type conflicts. plans of smaller problems and then replace every problem object by a corresponding variable with an identical Precondition under Composition: A literal appearing name leaving any domain constant as it is. Later we do in the precondition of the later action might be satisfied unification wherever we find common variable names. or contradicted and a function value might be changed by the effect of the former action. Therefore the resul- 2.2.2 Evaluating Individual Macros tant precondition will be a conjunct of the precondition of the former action and the modified precondition of For assessment of the macros, we generate a set of probthe latter. It might contain subsumptions of conditions lems called assessment problems which are more com-

U tility=U Ps Pu Pg = Ps PUu Pg =−eα =U Ps e

−3α 4

3α = PUs e 4

where

if U > 0 and Pu = 0 if U < 0 and Pu = 0

=U Ps Pu e α = PsUPu e 2

−α 2

if U > 0 and Pg = 0 if U < 0 and Pg = 0

(T

P 0p

p∈S

−TAp )T0p

p∈S

, T0p

S: the set of problems solved using macro, TO p: time required to solve Problem p using no macro, TA p: time required to solve Problem p using the macro, Ps : probability(a problem is solvable using the macro), Pu : probability(the macro is used in a plan), Pg : probability(a problem solved in less time with the macro) α and e: any positive numbers such that eα tends to infinity.

Figure 4: A utility function for macro evaluation

plex than the macro-lifting problems but should not take too much time to solve (for every macro all these problems need to be solved). Then with the given planner, we solve all assessment problems using the orginal domain and the macro-augmented domains. Finally, we use the utility function shown in Figure 4 to rank each macro. The basic form of the utility function is U , the weighted mean time gain for the augmented domain with respect to the original domain considering the time required with the original domain as weights. The multiplicative factors are to counter balance the misleading high value of U . A macro is not good if it makes most problems not-solvable within the time limit, is not used in most plans and does not help solve most problems faster. For negative values of U , the multiplicative inverse of the probabilities are used for obvious reasons and zero probabilities are replaced by very small values.

2.2.3

2.3

Pruning the Macro Space

We adopt some pruning techniques, both in the generation and the evaluation phase, that will reduce the cost of exploring the potentially inferior macros. 2.3.1

P U =

if U > 0 and Ps Pu Pg > 0 if U < 0 and Ps Pu Pg > 0 if Ps = 0

Pruning in Generation Phase

The following are the pruning methods we use in macro generation phase. Common parameter pruning: Actions having no parameter in common (in our case by name) are not logically connected and would generally make no sensible macro. Maximum parameters pruning: Macros having large numbers of parameters are not promising since they cause huge numbers of instantiated operators. Parameter name pruning: Macros that are equivalent in action sequences but have different parameter names should be considered as copies of one macro. Long sequence pruning: macros comprising longer sequences of actions are less likely for reuse. Unsatisfiable precondition pruning: Macros having easily detectable unsatisfiability in their preconditions should be discarded. Inconsistent effect pruning: Macros having inconsistencies in their effects should be discarded. Null effect pruning: Macros having subsequences of actions with null effects are not minimal and should be discarded. 2.3.2

Pruning in Evaluation Phase

For better utilization of the learning time during problem solving in the augmented domains, we adopt some other pruning strategies. However these pruning strategies also help us in detecting unsatisfiability in preconditions as bad macros cost much time. Timeout-limit pruning: Bad macros cause time wastage in solving most problems. Exclusion-limit pruning: Bad macros are not included in most plans. Resource-limit pruning: Bad macros creat resource scarcity and the problem becomes unsolvable.

Seeding the Initial Population

3 A seeding strategy has already been discussed above when a solution to the parameter unification problem is sought.

Experimental Results

This section summarizes our experiment and the analysis of our results. For a given domain, we set up

Plan Time (sec)

the learning process assigning intuitively chosen val10000 ues to the parameters of our genetic algorithm. We 1000 then run the learning process which gives us a ranked 100 list of macros. In order to investigate individual performance of the best macros, we generate a third set 10 No-Macro of problems having increasing difficulty which we call Macro-1 1 Macro-2 performance problems. The difficulty level of a perMacro-3 formance problem is measured by the time required to 0.1 solve the problem with the original domain. For every 0.01 selected macro, the performance problems are solved 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Problem using the corresponding augmented domain. Finally, we plot plan times in graph charts, analyse the search Figure 5: Plan time: Gripper states explored, percentage increase or decrease in plan lengths, and the percentage of levels the goal is lifted in the search tree, and test three hypotheses for comprehard problems are solved, learnt macros can not be hensive outcome about the whole work. lifted from any plans, plans are around 33% inferior, goals are achieved at about 33% less depth.  

Experimental Setup

The typical setup of the learning process is as follows: - Planner used: FF - Import problem: 2-5 objects per type - Assesment probs: solvable in 1-10 sec - Number of assessment probs: 6-20 - Max parameters: max arity of the actions + 2 - Lifted macros: 2-4 actions - Time out: 60 secs for max 33% probs - Macro exclusions: max 33% plans - Resource-limit killing: max 33% probs - Crossover 34% Mutation - add 33% delete 33% - Current population: 2 × number of actions - Children to generate: 2 × number of actions - Maximum number of generations to run: 15 - Replacement: 20% in any of consecutive 2 runs - Generation attempts: 2500 for one new macro

3.2

Results

We have run our learning algorithm on about 10 STRIPS domains and we present our results for each domain separately. For convenience, we shall hereafter use No-Macro to mean the original domain, Macro 1 to refer to the best macro and/or the corresponding augmented domain, Macro 2 to refer to the second best macro and/or its augmented domain and so on. 3.2.1

Gripper Domain

Outstanding performance is obtained in the Gripper domain (see Figure 5). Problems are solved faster, more

3.2.2

Tyre-world Domain

1000 No-Macro Macro-1

100

Plan Time (sec)

3.1

 

Macro-3

10

1

0.1  

Macro-2

1

2

3

4

5

6

7

 

8

9

10 11 12 13 14 15

Problem

Figure 6: Plan time: Tyre-world Outstanding performance is also achieved in the Tyre-world domain (see Figure 6). Problems are solved ten times faster, more hard problems are solved, learnt macros can not be lifted from any plans, Plans are around 17% inferior, goals are achieved at about 8% less depth. 3.2.3

Depot Domain

Performance is not impressive in the Depot domain (see Figure 7). Problems are solved faster, more hard problems are solved, learnt macros can not be lifted from any plans, Plans are around 15% superior, goals are reached at about 30% less depth.

parameter assumes a higher value, and we need bigger values of the variable parameter to see any significant improvement.

10000

Plan Time (sec)

1000

100 No-Macro

3.2.5

Driver-Log Domain

Macro-1 Macro-2 Not-Solved

10

100000 1

2

3

4

5

6

7

8

Problem

9

10 11 12 13 14 15

10000

Plan Time (sec)

1

1000

Figure 7: Plan time: Depot As the improvement margin is somewhat narrow, it should be useful to show its statistical significance. However this would be difficult since using the original domain, harder problems seem not solvable within a reasonable time limit. 3.2.4

Ferry Domain

10000

Plan Time (sec)

1000

100 No-Macro Macro-1

10

1  

1

Macro-2

2

3

4

5

6

7

 

8

Problem

9 10 11 12 13 14 15

100 No-Macro Macro-1

10

Not-Solved

1 0.1 1

 

2

3

5

4

6

7

 

8

Problem

9 10 11 12 13 14 15

Figure 9: Plan time: Driver-Log Figure 9 shows the performance of the best macro learned in the Driver-Log domain. We see that the best macro has somewhat mixed performance with smaller problems but makes clear distinction when the problems get much harder. The hardest 6 problems could not be solved without any macro given ten times the time taken to solve with the best macro. But due to incomplete information about the hardest problems, we cannot clearly compare the quality of the plans or the depth of the goal states. 3.2.6

Zeno-Travel Domain

10000

Outstanding performance is obtained in the Ferry domain (see Figure 8). Problems are solved faster, Plans are around 29% inferior, goals are at about 67% less depth. Ferry has only two problem parameters and the ups and downs in the graph are because of the variation of one problem parameter keeping the other fixed. This graph is presented in a different way to show the impact of macros in domains having multidimensional problem parameters. The improvement gets higher as one parameter increases while the other is fixed. Below every trough macros are costlier which means the gain does not surpass the overhead. The trough rises as the fixed

Plan Time (sec)

Figure 8: Plan time: Ferry 1000

100 No-Macro Macro-1

10  

1

2

3

4

5

6

7

 

8

9 10 11 12 13 14 15

Problem

Figure 10: Plan time: Zeno-Travel Problems are solved significantly faster in the ZenoTravel domain (See Figure 10). Plans are both supe-

rior and inferior in length, and the goals are achieved at about 15% less depth. 3.2.7

Satellite Domain

Hypothesis 1 (H1): Good macros can be found by evalution of the candidate macros using our utility function.

10000

Plan Time (sec)

1000

Hypothesis 2 (H2): Good macros can be found lifting randomly from plans.

100

Hypothesis 3 (H3): Evolution can learn better macros that cannot be lifted from plans.

10 No-Macro

1

 

0.1 1

not done any statistical tests but have made cautious apparent observations for the time being. Figure 12 presents the outcomes.

Macro-1

2

3

4

5

  Problem

6

7

8

9

10

Figure 11: Plan time: Satellite In the Satellite domain, some improvement is obtained (see Figure 11. Problems become practically unsolvable as they get harder, and mixed performances are achieved for both the plan quality and the goal state depth.

Domain H1 Tyre-world yes Gripper yes Depot yesFerry yes Driver-Log yes Zeno-Travel yes Satellite yes yes ⇒ accepted + ⇒ strongly

H2 H3 yes yes+ yes yes yes- yes yes no yes no yes no yes no no ⇒ rejected - ⇒ with caution

Figure 12: Hypotheses on the domains 3.2.8

Logistics and Settlers Domains

We can learn apparantly promising macros for both the Logistics and the Settlers domains. But their performance against the performance problems are yet to be studied. In these two relatively complex domains, we face some problems with getting representative problem sets having appropriate and successively increasing difficulty. 3.2.9

Briefcase and Miconic Domains

We could not learn any good macros for either of the Briefcase or the Miconic domains. We suspect that due to the inherent infinite parallelism in the domain structure, macros result in excessively long plans that discourage the planner from using them.

3.3

Analyses

For comprehensive analyses of the macros that we have learnt by our genetic evolutionary algorithm, we set up three hypotheses which are to be tested statistically on the basis of our experimental results. In most of the cases the outcomes are obvious and for the rest we have

Hypothesis 2 generalizes the work of Macro-FF and MARVIN which are both limited in extent. In complex domains having large numbers of actions, the random lifting must need some guidance as any systematic exhaustive search would try huge numbers of possible action combinations and require enormous time. Genetic algorithms provide this guidance and use some sort of hill climbing to get optimal macros in fewer attempts. Besides the hypotheses about genetic evolution, we can further analyze and summerize our results about macros as follows: - Macros help solve harder problems within the given time (see results of all 7 domains) - Macros help solve harder problems which cannot be solved with the original domain under given resourse limit (see results of Tyre-world and Gripper). Again the opposite might happen as seen in the Satellite domain. - Macros help solve problem exploring many fewer states as seen in all seven domains.

- Macros are beneficial when problems are sufficiently large and complex. The results of the Ferry domain explains this. - Macros reach a goal at less depth as found in most of the domains. - Macros lead inferior plans in terms of plan length. But once a valid plan is found quickly, some plan optimization technique can be used to get a better one. Finally, one might be interested to know how much time it needs to generate a full population of macros as there are so many prunings in macro generation phase. In the Tyre-world domain having 13 actions, it takes around 3 seconds at the sixth generation. In successive generations the required time increases as we generate only new child macros that have never occured before. Overall, the time values are not countable at all when compared to the times required to solve larger problems.

4

Future Works

Macros are expected to be more useful for complex domains especially where the number of actions is large. So far we have successfully run the whole process for relatively small and moderate sized STRIPS domains. Therefore trying our learning method on more complex domains like Settlers (24 actions) and UMTranslog (38 actions) is a major direction of future work. Problems with appropriate difficulty level are important for our work and are difficult to generate in complex domains also. Therefore writing appropriate problem generators for those domains will also be an important future work. Our method should work with other planners like LPG since it does not use any specific structural feature of a planner. We shall also try that shortly. In this work, we find a single macro that performs well when added into the domain. But finding a set of macros, that achieves the search speed up if added altogether into the domain is also a very important task. One way might be to use incremental learning and another to use a genetic approach on a population of macros.

5

Conclusion

This paper presents an automatic macro learning method based on a genetic approach that requires no structural knowledge about the domains and the search algorithms. Despite recent significant progress in planning, many complex domains, and even simple domains with larger problems, remain challenging. Macros are thought to have significant influence on search speed and are thus expected to extend the limit beyond which problems are treated as practically unsolvable. MacroFF achieved significant improvement using macros of length 2 learnt by component level abstraction of the domain and by random lifting from plans. MARVIN also achieved some improvement in several domains with its online macros learnt by partial order lifting from plans of reduced problems found by elimnation of symmetries , and also by memoising plateau escaping sequences. However, both of them need some structural knowledge about the domains or the search algorithms. In this paper we showed that genetic algorithms can evolve good macros for FF that achieve significant improvement if added to the domains as additional operators. We found very good speed up in 4 domains: Tyre-world, Gripper, Ferry and Driver-Log, and significant or seemingly significant improvement in 3 domains: Satellite, Zeno-Travel and Depot. We found apparently promising macros in Logistics and Settlers which are to yet be tested against appropriate problems. But we could not find any good macros in the Briefcase and the Miconic domains. In several domains we also found that evolution can generate good macros that cannot be learnt lifting randomly from plans.

Acknowledgment This research is supported by the Commonwealth Scholarship Commission (CSC) in the United Kingdom. We also thank Andrew Coles and Amanda Smith for sharing their results and thoughts about macros in planning.

References [1] Aler R., Borrajo D., Isasi P. (2001), Learning to Solve Problems Efficiently by Means of Genetic Programming, Evolutionary Computation, 9(4), 387420.

[2] Armano G., Cherchi G., Vargiu E., A System for [13] McDermott D., The 1998 AI Planning Systems Generating Macro-Operators from Static Domain Competition, AI Magazine, 21(2), 35-55. Analysis, Proceeding (453) Artificial Intelligence [14] Muslea I. (98), A General Purpose AI Planand Applications. ning System Based on the Genetic Programming [3] Bacchus F. (2001), AIPS’00 Planning Competition, Paradigm, Proceedings of the World Automation AI Magazine, 22(3), 47-56. Congress. [4] Botea A., Enzenberger M., M¨uller M., Schaeffer J. [15] Spector L. (1994), Genetic Programming and AI (2005) Macro-FF: Improving AI Planning with AuPlanning System, AAAI. tomatically Learned Macro-Operators, To appear in [16] Veloso M., Carbonell J., Perez M., Borrajo D., Journal of Artificial Intelligence Research. Fink E., Blythe J. (1995), Integrating planning and [5] Borrajo D., Veloso M. (1997), Lazy incremental learning: The PRODIGY architecture, Journal of learning of control knowledge for efficiently obtainExperimental and Theoretical Artificial Intelligence, ing quality plans, AI Review, 11(1-5), 371-405. 7, 81-120. [6] Coles A.I., Smith A.J. (2004), MARVIN: MacroActions from Reduced Versions of the Instance, IPC4 Booklet, ICAPS 2004.

[17] Westerberg C.H., Levine J. (2000), GenPlan: Combining Genetic Programming and Planning, 19th Workshop of the UK Planning and Scheduling Special Interest Group (PLANSIG).

[7] Gerevini A., Serina I. (2002), LPG: a Planner based on Local Search for Planning Graphs, [18] Westerberg C.H., Levine J. (2001), Optimizing Proceedings of the Sixth International Conference Plans using Genetic Programming, 6th European on Artificial Intelligence Planning and Scheduling Conference on Planning. (AIPS’02). [8] Hern´adv¨olgyi I.T. (2001), Searching for Macro Operators with Automatically Generated Heuristics, Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence, 194203. [9] Hoffmann J., Edelkamp S., Englert R., Liporace F., Thi´ebaux S., Tr¨ug S. (2004), Towards Realistic Benchmarks for Planning: the Domains Used in the Classical Part of IPC-04, 4th International Planning Competition, 7-14. [10] Hoffmann J., Nebel B. (2001), The FF Planning System: Fast Plan Generation Through Heuristic Search, Journal of Artificial Intelligence Research 14, 253-302. [11] Long D., Fox M. (2003), The 3rd International Planning Competition: Results and Analysis, JAIR, 20, Special Issue on the 3rd International Planning Competition, 1-59. [12] Levine J., Humphreys D., (2003), Learning Action Strategies for Planning Domains using Genetic Programming, Applications of Evolutionary Computing, EvoWorkshops2003, 2611, 684-695.

Suggest Documents