Skill Acquisition Can be Regarded as Program Synthesis Ute Schmid and Fritz Wysotzki
Institute of Applied Computer Science Berlin University of Technology Franklinstr. 28-29, D-10587 Berlin email:
[email protected],
[email protected] Abstract
We propose to employ the method of inductive program synthesis for modelling skill acquisition, thereby oering an elaborate formal background for learning by doing. Introducing the notion of program synthesis in this domain furthermore enables us to deal with problem solving skills and motor or process control behavior in an uniform way.
1 Introduction In Arti cial Intelligence as well as in cognitive psychology, the acquisition of problem solving skills and the acquisition of motor or process control behavior is usually explored within dierent approaches. Problem solving skills (\cognitive skills") are usually described by production rules. Accordingly, skill acquisition is modelled mainly by chunking of rules (see [LNR87, And82]). Control behavior on the other hand is usually described by neural or cybernetic circuits (\control programs"). Acquistion of control skills is often modelled by reinforcement learning ([DBS93]), sometimes by backpropagation ([NW89]). Characteristic for \classical" problem solving is, that each problem can be described by a set of discrete problem states ([Nil80]). Problem states are only aected by the application of problem solving operators. That is, the problem solving process is completely controlled by the problem solving system. We will call such problem spaces \discrete and static". In contrast, control behavior usually involves dealing with continuous parameters in an environment (e.g. speed, position). It has to be dealt with the fact, that not only the control action itself produces state changes, but that there may be additional factors, which are not controllable by the operator (e.g. wind). We will call such problem spaces \continuous and dynamic". We propose to look at both learning situations as program synthesis problems, thereby gaining a unifying view. Program synthesis is an approach to automatic program construction, where a generalized (recursive) program is inferred from examples of the desired input/output behavior ([BGK84]). In problem solving, we can regard a given problem state as input and the operator sequence, which transforms this state into the goal state as output. From experimentation with a problem of given complexity (for example clearing the bottom block of a three block tower) there should be inferred a generalized program for dealing with arbitrary problems of this type (for example clearing the bottom block of a n block tower). In motor or process control, the situation is analogous: the parameters of a given system state constitute the input, the control operation(s), leading to or maintaining the desired system state constitute the output. From experimentation with dierent situations of the system a generalized control program for dealing with arbitrary parameter values (states) should be inferred. The approach of inductive program synthesis gives a general framework for describing the inference of generalized behavioral programs from experimentation with example situations, that is, program synthesis provides a formal approach to learning by doing. In the following, we will outline our general approach in more detail and afterwards will present the resulting speci c approaches to the acquisition of problem solving skills and of adaptive control behavior.
2 A Framework for Skill Acquisition We incorporated our approach to skill acquisition in the architecture of the system IPAL, which is implemented in LISP as a rst prototype1 (see gure 1). Problem Specification (states + operators)
Problem Solving / Planning
Initial Program (problem specific solution)
Mapping otherwise
x: s(x) < S
Inductive Program Synthesis
Generalization
Hierarchical Memory
Analogical Problem Solving
RPS
Figure 1: System Architecture of IPAL We rst describe, how IPAL works for discrete and static problems: Input to the system is a set of problem states together with a set of production rules for state transformation and a goal state. In a rst step, an (optimal) operator sequence which transforms a state into the goal is calculated for each input state. Here we use well known algorithms for heuristic search (as best rst, see [Nil80, BSW96]) or an approach to generalized planning ([Wys87, SW96]). The exploration of all input states leads to a rst kind of generalization: The states and the constructed operator sequences can be combined to an \inital program". This initial program is a conditional expression: It is tested, which current state is given and output is the corresponding operator sequence. We call this process generalization over problem states. The next step is, to generalize over the problem complexity. That is, we will infer a generalization over recursive enumerable problem spaces. If, for example, an initial program for the tower of hanoi problem with three discs is given, a recursive program which can handle the n disc problem should be synthetisized. Here we use a technique for inductive program synthesis elaborated by Wysotzki ([Wys83, SW96]). In this approach, a recursive program scheme is inferred. That means, that the resulting term is independent of the syntax of a concrete programming language. The concept of a program scheme corresponds to the frame concept in knowledge representation. The recursive programming scheme is de ned by its structure. The operation symbols of the scheme can be interpreted by dierent semantic operations. This characteristic can be exploited for a third kind of generalization: generalization over problem classes. If for example the general solution for the tower of hanoi problem has been learned, isomorphic problems ([SH76]) can be solved by analogical transfer ([Gen83]). We also explore analogical problem solving for non-isomorphic problems, using a distance metric based on structural similarity ([SWss]). For a given initial program, IPAL checks, whether there is a suciently similar problem in memory. If this is the case, the initial program is generalized by analogical transfer, otherwise the recursive program scheme is induced with our synthesis algorithm. In case of continuous and dynamic problems IPAL works similar as described above: Input is a set of randomly selected but representative parameter vectors, a set of control operations, a system goal (area) and additionally an evaluation function. The optimal control action for each input vector is calculated by means of heuristic search. Combination of the state-action-pairs to a conditional program is more sophisticated as for discrete and static problems. The continous space of parameters has to be divided 1 Implementation of the system is done by Mark M uller, incorporating work which was done in two studentical projects on the Berlin University of Technology.
in regions where the same control action is to be applied. For continuous dynamic problems we have up to now only worked on this rst step of generalization.
3 Acquisition of Problem Solving Skills In case of problem solving we start with an ordered set of all states of a problem or with a representative selection of states2 . In the simple case of clearing the bottom block in a three block problem space there are three states (see gure 2). A B
B
C
C
on(A, B), on(B, C), cleartop(A)
C
A
on(B, C), cleartop(A), cleartop(B)
B
A
cleartop(A), cleartop(B), cleartop(C)
Figure 2: Example problem space for clearing a block We have one production rule on(x; y); cleartop(x) ! puttable(x) ADD cleartop(y) DEL on(x,y) and
topof(y) = x , on(x; y): Problem solving by heuristic search produces the operator sequences given in table 1.
Table 1: Problem states and corresponding operator sequences 1 cleartop(A) 2 on (B, C), cleartop(B) puttable(B) = puttable(topof(C)) 3 on (A, B), on (B, C), cleartop(A) puttable(A), puttable(B) = puttable(topof(topof(C))), puttable(topof(C)) These expressions can be reformulated in situation calculus ([Gre69]), using s as situational variable. Variabilization of the constants and selecting minimal conjunctions of predicates for discriminating problem states leads to the following conditional expression IF cleartop(x) THEN s ELSE IF cleartop(topof(x)) THEN puttable(topof(x), s) ELSE IF cleartop(topof(topof(x))) THEN puttable(topof(x), puttable(topof(topof(x)), s)) [ ELSE undefined ]. 2
For dealing with random subsets of a problem space in maze problems or complex problems as Rubic's cube see [BSW96].
This conditional expression can now be used as initial program for our program synthesis algorithm. We will demonstrate generalization to a recursive program scheme by a simple programming problem3. The theoretical background for synthesis of recursive progam schemes (RPS) is given in [Wys83]. We look at the problem to check whether an element x is member of a list l for lists from zero to two elements. It may be unusual to regard this programming problem in the context of problem solving. But we can present the problem solver a selection of states, as for example ([],x), ([x],x), ([y],x), ([x,y],x), ([y,x],x), ([x,y],z), described with the predicates empty(l) and equal(x; y). Production rules here do not generate operator applications but the output true (if x is contained in l) or false (otherwise). The initial program for the member problem is: IF empty(l) THEN false ELSE IF equal(x, head(l)) THEN true ELSE IF empty(tail(l)) THEN false ELSE IF equal(x, head(tail(l))) THEN true [ ELSE undefined ].
The idea of inductive program synthesis is to identify a subterm tr in the initial program for which holds n?1) ): tr = G(n)(G([v=t ] That means, the initial program has to be described by a sequence of subterms G(i), where a part of G can be uni ed with its predecessor in the sequence: G(0) =
G(1) = IF empty(l) THEN false ELSE IF equal(x; head(l)) THEN true ELSE
G(2) = IF empty(l) THEN false ELSE IF equal(x; head(l)) THEN true ELSE IF empty(tail(l)) THEN false ELSE IF equal(x; head((tail(l))) THEN false ELSE
= IF empty(l) THEN false ELSE IF equal(x; head(l)) THEN true ELSE G(1) [l=tail(l)] ...
stands for \unde ned". For the initial program \member(x,l)" the subterm tr is (i)
IF empty(l) THEN false ELSE IF equal(x; head(l)) THEN true ELSE
with the substitution [l=tail(l)]. Now we can \fold" the initial program to a RPS: member(x,l) = IF empty(l) THEN false ELSE IF equal(x, head(l)) THEN true ELSE member(x, tail(l)).
In contrast to a production system approach to skill acquisition (see [And83, And93, New91]), our approach goes beyond the chunking of rules and variablization of constants. Indeed, this is only the rst step of generalization. While in a production system approach, cyclic processing of rules is controlled by the interpreter, we believe, that people also acquire \knowledge units" representing not only sequences but loops of operations. This is achieved by our second generalization, the synthesis of recursive program schemes. The inferred recursive program scheme can now be stored in memory and may be reused for dealing with structural similar or identical problems. The RPS of the member function can for example be used for dealing with problems as to decompose a tower until a certain block b is on top, or to extract an given element e from a stack, or to check whether all elements of a list are greater of less than a given value x and so on. For dealing with such structural isomorphic problems, the operation symbols of \member" have to be newly interpreted (see table 2). 3
Inference of a recursive generalization for the \cleartop problem" is described in [SW96]
Table 2: Alternative interpretations for the symbols of \member" member ndblock ndelement greaterall lessall x b e x x l tower stack l l true true true true true false false false false false empty notower emptystack empty empty equal equal equal greater less head topof top head head tail removetop pop tail tail
4 Acquisition of Adaptive Control Behavior In cognitive problem solving we have to deal with a nite set of discrete input states to construct a (minimal) operator sequence to transform such states into a goal state. In motor and process control we have an in nite set of parameter vectors describing states of the environment or of a (technical) system. Here is the aim to generate control trajectories in the state space of a process for real time applications. Well known approaches to this domain in arti cial intelligence are BOXES ([MC68]), CART ([CU87]) and ASE/ACE ([BSA83]). We propose to deal with control behavior as analogous to problem solving, employing problem solving and machine learning techniques. In a rst step, a representative set of input states (\training data") has to be generated. Then, a problem solving algorithm determines the optimal control action for each state by means of an empirical evaluation function. Otherwise than in discrete problem spaces, here the goal state usually is not a point, but an area in the parameter space (i. e. continuous problem space). Wysotzki and Muller ([MW94]) employ a decision tree technique to split the state space automatically in subareas with unique control actions (CAL5, see [UW81]). The resulting decision tree represents a control program: It can be read as a nested conditional expression, where a state is tested in which area of the parameter space it falls and the appropriate control action is executed, giving the next state and so on, i.e., this program can be applied recursively, where the i-th input is the product of the application of an control operation to the input i ? 1. In the following we will describe only the rst kind of generalization (over problem states). As an example we use the roll axis stabilization (as reaction to disturbances) of a communication satellite in orbit with respect to its position to the earth axis4. The position manoeuvers of the satellite are carried out by thruster torques delivering a set of discrete control actions the application of which consumes fuel. The control task is to hold the satellite axis in a certain small target interval of its attitude angle consuming as little fuel as possible (i.e. performing a small number of control actions). The state space of the process is de ned by the attitude angle ' of the satellite and its rate '._ The target area of control is some region around ' = 0 and '_ = 0 (some maximal value of j'j is given which must not be exceeded). There are additional variables describing oscillations of the two solar generators of the satellite which in uence the process states, considered as noise. In a rst step a training set of n randomly selected points in the two dimensional state space X is constructed. Each input is a tupel mj = ('; ') _ 2 X. Additionally there is a set of control actions Pf = ff1 ; :::fqg and an evaluation function F : X ! R, which maps the state space into the set of real numbers. For each input the optimal control action is calculated: All control actions are applied to a given input state and thereby a set of following states is constructed. The action which leads to a following state with minimal value of F is associated with the input state. The pairs (mj ; fopt ) are now used to construct a control program. Here the classi cation algorithm CAL5 is used, which can be used for constructing decision trees with automatic calculation of optimal discretizations of continuous parameter values (see gure 3). The decision tree classi es system states mj with respect to the control action f which should be applied to each state. It can be read as a conditional program (\IF ' is less than -1.7 THEN use thruster 4 This is an application conducted by Wolfgang M uller and Fritz Wysotzki of the Fraunhofer Institute for Information and Data Processing, Branch Lab for Process Optimization (EPO), Berlin, as part of the project WISCON, supported by the German Ministry of Science and Technology.
p0 < -1.7
< -1.6
1
< 1.2
0
< 1.6
p1
< -2.0
< -1.6
1
1
0
< -10.0
p0
< -1.2
1
0
< 1.2
p2 < -20.0
< 20.0
0
< 2.0
< 10.0
0
2
< 1.6
< 2.0
p2
2 < 20.0
2
< 1.2
0
Figure 3: Program for controlling the behavior of a satellite represented as decision tree (p0 = ', p1 = ',_ p2 = oscillation of one solar generator; 0 = no control action, 1 = action +Tc , 2 = action ?Tc )
Figure 4: Plot of real time control for the satellite
torque Tc ..."). This program generalizes over the presented input states from which it was constructed, segmenting the whole state space in regions with associated optimal control actions. While the initial steps - problem solving and construction of the control program - are to be time consuming processes, the resulting program can be used for ecient realtime process control (see gure 4).
5 Discussion We have proposed to look at skill acquisition from the viewpoint of program synthesis. Thereby we gain
(1) a unifying view for acquisition of cognitive as well as motor/ process behavior and (2) have a sound
theoretical background in theoretical computer science. Other aspects of our work, not presented here, cover psychological experiments in the acquisition of programming skills (see [SK95]), the use of analogical reasoning (see [SWss]) and an exploration of the formal characteristics of dierent problem solving and program synthesis algorithms. We hope, that our approach will be useful as a framework for describing human skill acquisition in a cognitive science context. Furthermore, our work is a contribution to the area of knowledge based software engineering ([LD89]). Our system IPAL can be used to build (small) programs from input examples and to reuse program structures by analogical reasoning. Last but not least, the inference of control programs by problem solving and program synthesis can be applied in dierent areas of process control.
References [And82] J. R. Anderson. Aquisition of cognitive skill. Psychological Review, 89:369{406, 1982. [And83] J. R. Anderson. The Architecture of Cognition, volume 5 of Cognitive Science Series. Havard University Press, Cambridge, MA, 1983. [And93] J.R. Anderson. Rules of the Mind. Lawrence Erlbaum, Hillsdale, NJ, 1993. [BGK84] A. W. Biermann, G. Guiho, and Y. Kodrato, editors. Automatic Program Construction Techniques. Collier Macmillan, 1984. [BSA83] A. B. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve dicult learning control problems. IEEE Transactions on Systems, Man and Cybernetics, 13(5), 1983.
[BSW96] L. Briesemeister, T. Scheer, and F. Wysotzki. A concept based algorithmic model for skill acquisition. In U. Schmid, J. Krems, and Fritz Wysotzki, editors, Proceedings of the First European Workshop on Cognitive Modeling (14.-16.11.96, TU Berlin), 1996. [CU87] M. E. Connell and P. E. Utgo. Learning to control a dynamic physical system, pages 456{460. Machine Learning and Knowledge Acquisition. 1987. [DBS93] T. Dean, K. Basye, and J. Shewchuk. Reinforcement learning for planning and control. In S. Minton, editor, Machine Learning Methods for Planning, chapter 3, pages 67{92. Morgan Kaufmann, 1993. [Gen83] D. Gentner. Structure-mapping: a theoretical framework for analogy. Cognitive Science, 7:155{ 170, 1983. [Gre69] C. Green. Application of theorem proving to problem solving. Technical report, IJCAI 1, 1969. [LD89] M. Lowry and R. Duran. Knowledge-based software engineering. In P. R. Cohen A. Barr and E. A. Feigenbaum, editors, Handbook of Arti cial Intelligence, volume IV, pages 241{322. Addison-Wesely, Reading, MA, 1989. [LNR87] J. E. Laird, A. Newell, and P. S. Rosenbloom. Soar: An architecture for general intelligence. Arti cial Intelligence, 33:1{64, 1987. [MC68] D. Michie and R. A. Chambers. Boxes: An experiment in adaptive control. In E. Dale and D. Michie, editors, Machine Intelligence, volume 2, pages 137{152. Oliver and Boyd, Edinburgh, 1968. [MW94] W. Muller and F. Wysotzki. Autmatic construction of decision trees for classi cation. In K. Moser and K. M. Schader, editors, Annals of Operation Research, volume 92, pages 231{ 247. J. C. Baltzer AG Science Pub., Wijdenes, The Netherlands, 1994. [New91] A. Newell. Unifyied Theories of Cognition. Cambridge University Press, Cambridge, MA, 1991. [Nil80] N. J. Nilsson. Principles of Arti cial Intelligence. Springer, 1980. [NW89] D. Nguyen and B. Widrow. The truck backer upper: an example of self-learning in neural networks. In Proc. IJCNN, volume 2, pages 357{363, 1989. [SH76] H. A. Simon and J. R. Hayes. The understanding process: Problem isomorphs. Cognitive Psychology, 8:165{190, 1976. [SK95] U. Schmid and B. Kaup. Analoges Lernen beim rekursiven Programmieren (Analogical learning in recursive programming). Kognitionswissenschaft, 5:31{41, 1995. [SW96] U. Schmid and Fritz Wysotzki. Fertigkeitserwerb durch induktive Programmsynthese und generalisiertes Planen (Skill acquisition by inductive program synthesis and generalized planning). In W. Dilger, M. Schlosser, J. Zeidel, and A. Ittner, editors, Proceedings of FGML-96 (19.21.8.96, TU Chemnitz), pages 106{111, 1996. [SWss] U. Schmid and Fritz Wysotzki. Induktion von rekursiven Programmschemata und Analoges Lernen (Induction of recursive program schemes an analogical learning). In R. H. Kluwe, editor, Kognitionswissenschaft: Strukturen und Prozesse intelligenter Systeme. Westdeutscher Universitatsverlag, Wiesbaden, in press. [UW81] S. Unger and F. Wysotzki. Lernfahige Klassi zierungssysteme. Akademie-Verlag, Berlin, 1981. [Wys83] F. Wysotzki. Representation and induction of in nite concepts and recursive action sequences. In Proceedings of the 8th IJCAI, Karlsruhe, 1983. [Wys87] F. Wysotzki. Program synthesis by hierarchical planning. In P. Jorrand and V. Sgurev, editors, Arti cial Intelligence: Methodology, Systems, Applications, pages 3{11. Elsevier Science, Amsterdam, 1987.