Integrating Models of Knowledge and Machine Learning

0 downloads 0 Views 80KB Size Report
It will show a way to integrate a machine learning algorithm into a knowledge ... dedicated to automatize the first bid of the card game of bridge. ... In the course of this paper, we will examplify all of them on a practical example related to the first ...
Integrating Models of Knowledge and Machine Learning

Jean-Gabriel GANASCIA2

Jérôme THOMAS1-2

1ONERA, DMI, GIA 29, av. de la division Leclerc BP 72 93433 Chatillon Cedex e-mail:{laublet thomas}@onera.fr

Philippe LAUBLET1

2LAFORIA-CNRS Université Pierre et Marie Curie Tour 46-0, 4 place Jussieu 75252 Paris Cedex e-mail:{ganascia thomas}@laforia.ibp.fr

Abstract: We propose a theoretical framework allowing a real integration of Machine Learning and Knowledge acquisition. This article shows that the input of a Machine Learning system can be mapped to the model of expertise as it is used in KADS methodology. The notion of learning bias will play a central role. We shall see that parts of it can be identified to what people are used to call the inference and the task models. We shall also see that the description language and its structure, i.e., the background knowledge, can be mainly related to the domain model. Doing this conceptual mapping, we give a status to most of the inputs of Machine Learning programs in terms of knowledge acquisition models. The ENIGME system which implements this work will be presented in this paper. It will show a way to integrate a machine learning algorithm into a knowledge acquisition environment. Keywords: knowledge acquisition, models of knowledge, learning bias, empirical learning. 1. INTRODUCTION How Machine Learning and Knowledge Acquisition are related? It is not a free question since it appeared obvious to both scientific communities that it was necessary to establish bridges and links. Nevertheless, there is no obvious answer; most of the researchers have no clear idea on the way this integration could be achieved. Moreover, the two communities have different methodologies to deal with the same matter, i.e., knowledge. They have such cultural differences that it makes the dialog difficult. On the one hand, people coming from Machine Learning are used to build efficient and well designed algorithms, so they do not understand the discussions running around the notion of model. Considering the algorithms and the programs built by the Knowledge Acquisition people, they see mainly editors or graphic display without any "intelligence." On the other hand, people coming from the Knowledge Acquisition community think that Machine Learning can only be applied to trivial tasks where the knowledge representation has been

1

previously defined, i.e., where the Knowledge Acquisition processes have been almost achieved. Even if it is schematic, this view summarizes the general attitude in these two communities. Practically, it explains that most of the time, the attempts to integrate Machine Learning and Knowledge Acquisition depend on the community of origin. People coming from Machine Learning usually think that one has only to add some graphic environment and some editors to their algorithms [Webb 91, Buntine 90b, ...], while people coming from Knowledge Acquisition think that one can insert Machine Learning algorithms considered as black boxes in Knowledge Acquisition environments [Shadbolt 90] and tools [Gaines 89].

Our main claim is that Machine Learning algorithms have to be re-designed to be inserted in Knowledge Acquisition environment. This paper will provide some evidence about this point by showing that the inputs of a Machine Learning system can be mapped to the model of expertise as it is used in KADS methodology. More precisely the notion of learning bias will play a central role here: most of Machine Learning systems neglected its semantic, since their goal was only to automate the knowledge generation. In case of Knowledge Acquisition, it cannot be neglected because, if it is not trivial, it is part of the knowledge which has to be acquired. We shall see in the following that parts of the learning bias can be identified to what people are used to call the inference and the task models. We shall also see that the description language and its structure, i.e., the background knowledge, can be mainly related to the domain model. Doing this conceptual mapping, we give a status to the inputs of Machine Learning programs in terms of knowledge acquisition models. We will see in this paper that this status enables a real integration of Machine Learning and Knowledge Acquisition which cannot be reduced only to the insertion of Machine Learning programs into Knowledge Acquisition environment. To bring this conceptual work to light, we shall provide a practical example dedicated to automatize the first bid of the card game of bridge.

The paper will mainly be divided into six parts. The next section will present the notion of learning bias as it is classically understood in the Machine Learning community. Then we shall discuss a way it is possible to integrate Machine Learning and Knowledge Acquisition. In the fourth section, we shall provide some information concerning the notion of model of expertise in the KADS methodology and concerning the CHARADE learning system. We shall then present in detail the conceptual mapping between models and machine learning inputs in section five. Last, the sixth section will present the first results we have obtained with ENIGME.

2

2. LEARNING BIAS 2.1 Preliminaries In a classical view, the goals of machine learning are to develop computational theories of learning and to build learning machines. From this last point, machine learning can be seen as a process transforming source data (observations, traces of computation, teacher remarks...) into organized and usable knowledge. This process may involve many different methodologies and different operations, but it is not our purpose, here, to describe these methodologies or these operators. It is just to provide a general overview of the inputs of Machine Learning programs seen as transformation processes. Among all these inputs, some are easily identified. For instance, the notion of sources of data seems quite clear, even if it is difficult to know if training examples are derived from a data base of past examples or from prototypical examples. But, there are many other inputs which are not so easily identified. They correspond to the notion of learning bias whose role has been emphasized by many authors in the past [Mitchell 82, Utgoff 86, Ganascia 88, Russel 90,...]. Without going into details, it appears that, since induction can be seen as a search process, and since the space of possible generalizations is huge, it is necessary to introduce many heuristics and constraints to guide the search. According to [Russel 90] one distinguishes among three kinds of bias: the domain bias, the restriction type bias and the preference type bias. This chapter is dedicated to the description of these different kinds of bias. In the course of this paper, we will examplify all of them on a practical example related to the first bid of the card game of bridge. Thus, as a preliminary step, let us explain the main features of this problem. The goal is, knowing the hand, to choose an opening colour (between club, diamond, heart, spade or no trump). In this example, in order to simplify, we will only choose a colour and no level of opening. It is obviously a classification problem, but the expert uses several intermediate concepts (between the hand and the opening colour) to reach the solution, for instance, opening, kind-of-distribution, conventional-opening, kind-of-hand. A resolution will roughly follow the reasoning: to value the hand, if the valuation is sufficient then pass else, the next step is to abstract the kind-of-distribution from the distribution and the strong-colour to see if we have a normal hand or an extraordinary one. Indeed, there are two different ways to solve the problem, if it’s an extraordinary hand, we will follow a convention otherwise we will abstract the kind of hand from the observables, then we will compute the solution. 2.2 Domain bias Once a set of examples is given, it has to be described, so it is necessary to define and to refine a description language. It corresponds to the words and the concepts used to describe the data. In addition, it includes the structure of the description language which

3

gives the properties of concepts and the implicit relationships among concepts ... All this information corresponds to the first category of bias which can be called domain bias. As we shall see in the following, it corresponds to a part of the domain knowledge in the KADS terminology. Considering our current example, the domain bias corresponds to descriptors associated to direct observations — e.g. number of spades, hearts, diamonds, clubs — or to intermediate concepts — e.g. opening, kind of distribution, conventional-opening, kindof-hand... — and to logical formulas expressing implicit relations among these descriptors. For instance, these formulas can relate the valuation —i.e. the honor points or the distribution points — to the source data - i.e., the hand. They correspond to the notion of background knowledge in machine learning; they enable the introduction of procedural knowledge, or, more generally, the use of partially known knowledge. Practically, it contains the attributes, their syntax, hierarchies of descriptors, axioms expressing properties of attributes (as total order among the possible values, ...) and so on. 2.3 Restriction type bias The domain bias defines a structured hypothesis space which is the set of the candidate theories that the program could propose for any possible set of observation sentences. Furthermore, if the expert already has knowledge which allows to prune some parts of this space, it may be useful to provide the system with. According to [Russel 90], we will call such a knowledge the restriction type bias. For instance, let us consider again our example, and suppose that we want to build automatically a knowledge base dedicated to solve this problem. We have first to provide the system with the domain bias (i.e., the description language and its structuration). Once all this knowledge is given, we may provide the system with the reasoning that the produced rule system has to follow. Such a reasoning says to the learner that the relations from the opening color to the number of spades are useless. It may also point out the useful concepts to conclude on an other one. This information is implicit in most of the learning systems. Making it explicit allows more flexibility in the use of an apprentice. Few ML tools already use a restriction type bias to constrain the search space. For instance, CHARADE [Ganascia 87] uses as input of the learning phase an attribute structuration which fixes the general direction of the rules. For example, if the goal of the final expert system is to diagnose a complaint, all the rules could go from the symptoms through the syndromes to the diagnostics. In the same way, MOBAL [Morik 91] uses a topology of predicates. All these semantic constraints correspond to examples of the restriction type bias.

4

Our main claim is that this information has to be considered as part of knowledge and that it has to be acquired. We shall provide, in the following, some pieces of evidence concerning this point.

2.4 Preference type bias The third kind of bias is what [Russel 90] names the preference type bias. This kind of bias is related to the goal of the machine learning tool. Indeed, if the goal of the tool is, as the majority of the learning systems, to find a set of relations which explain all the examples of the learning set, among all the possible relations (i.e., allowed by both the description language and the restriction type bias), the learner may only need some of them to reach its goal. Thus, we have to provide the system with a preference ordering which allows the learner to choose some of the possible relations. We can summarize all this through the following diagram:

Sources

Bias 1: definition & structuration of the description language

Bias 2:

Bias 3:

Restrictive-type

Learning

bias

procedure

Preference-type bias

Learned knowledge

Figure 1: the different kinds of bias As we shall see in the following, our aim is to express all the learning bias as parts of the knowledge and to acquire it. Therefore, we have to define the status of all the parts of the learning bias in terms of knowledge acquisition. Mapping the learning bias with the knowledge acquisition models will provide some useful basement for the integration of machine learning and knowledge acquisition. Historically, designing the CHARADE system [Ganascia 87], our aim was to provide some semantic to all the information required as input of a learning system. Therefore, we

5

related some of those inputs to the operationality of the induced knowledge, i.e., to the goal of the learned knowledge. It was an attempt to introduce declarative bias in a learning system. We want now to pursue this work by providing some explicit links with the KADS methodology. The next section will be dedicated to the general links relating knowledge acquisition and machine learning. It will make more understandable the correspondence we want to establish between the learning bias and the structure of the knowledge in KADS methodology. 3. INTEGRATING M ACHINE L EARNING AND K NOWLEDGE A CQUISITION Now, we can come back to the initial questions: how to achieve a real integration of Machine Learning and Knowledge Acquisition which cannot be reduced to just a juxtaposition of techniques? What is the exact role of machine learning tools in Knowledge Acquisition? The Machine Learning people could say that it is to automate the knowledge acquisition process, but already in 1983, R.S. Michalski [Michalski 83] notes "that there does not seem to exist any universal measure for evaluating [the quality of inductive assertions] and that the importance given to each such measure depends on the ultimate purpose of constructing these assertions". Hence, the learning bias must be dependent on the domain at hand and must provide the system with the purpose of the rules it has to produce. Then, as [Buntine 90a] notes, the empirical learning may no longer ignore one of the key lessons for AI in the 1970s called the strong knowledge principle [Waterman 86]: "...to make a program intelligent, provide it with lots of high quality specific knowledge about some problem area". Moreover, most of the researchers in Knowledge Acquisition insist on the fact that the Knowledge Acquisition process is iterative and that it has to interact with experts. It proceeds by a succession of refinement. However, in machine learning, the notion of iterativity seems, at first sight, related to the notion of incrementality. It means that examples have to be given one by one to the learning procedure which progressively refines the learned knowledge. Unfortunately, a glance at existing machine learning procedures shows that, most of the time, the efficient machine learning algorithms are not incremental and that the computational cost of incrementality is very high. Does this mean that machine learning is useless in knowledge acquisition? Certainly not. On the one hand, the knowledge which has to be acquired is not only constituted by the knowledge learned by the machine learning algorithm. Therefore, the role of machine learning is not only to automatically build knowledge, but it is also to help to define the knowledge which is the learning bias. On the other hand, the iterativity of the knowledge acquisition process is not equivalent to the incrementality of the machine learning algorithm. Indeed, this iterativity means the possibility to refine the knowledge acquired so far thanks to the analysis of the expert

6

system's results. Thus, this refinement may not only be done on the examples but on all the knowledge that the learner knows. Then, the process of acquisition could be seen as an interactive process with a machine learning program where the source data and the different parts of the bias can be progressively refined. However, if we define this interaction as the acquisition of the learning bias, we cannot be satisfied just by identifying it as a sequence of trials and errors. In order to facilitate this acquisition it is necessary to give some meaning to this bias. In other words, while the learning system doesn't need to interact with humans, the internal way of learning and the reasoning done by the produced system doesn't really matter. The only important feature of the system is the performance of the learnt rules not the reasoning that produces this result. Let us note that, if the system needs to correct its errors, it will be useful for it to have an internal representation of its own functioning, but it is not useful that humans can understand such a functioning. On the other hand, if the system has to interact with an expert (or a user), it will be the role of this expert to teach the learner when it makes a mistake and to give it a way to modify its behaviour. When this occasion arises, it is really useful for the apprentice to be able to explain, in an understandable way, the fault reasoning to allow the expert to isolate one or more fault steps in it. Once the fault step has been isolated, a first way to modify the behaviour of the system is to give a counter-example of the false rule to the learner. But this way of doing can be a little tedious especially if we don't have a great number of examples or if we have given a too weak restriction type bias. A better way to modify the behaviour of the learnt rule system is to modify its exploration constraints. Thus, these constraints and their use by the learning tool must make sense for the expert. To fulfil these objectives, the solution proposed in this paper is to constrain the apprentice during the induction phase with kinds of knowledge that are issued from the knowledge acquisition framework. We will mainly use two kinds of knowledge: the problem solving method that the final expert system has to follow and semantic networks linking concepts of the domain. We have implemented all these ideas in a new tool, named ENIGME, where we have chosen to use the exploration procedure of CHARADE [GANASCIA 87]. The originality of CHARADE is that its exploration procedure does not include any kind of bias, hence, we will be able to only bias the search thanks to knowledge. Furthermore, we will express this knowledge with the KADS [Wielinga 92] methodology which is one of the most popular in the current knowledge acquisition approaches. Thus, before explaining the way we will use this knowledge to restrict the search space, let us present these two tools.

7

4. BACKGROUND 4.1 Charade The CHARADE system generates automatically knowledge bases from examples by detecting correlations in a training set of examples. It is based on the use of two distributive lattices, one for the learning set, i.e., the set of examples, and one for the description space. The properties of the lattices are used in the learning process to facilitate the detection of similarities and to limit the search cost. Concretely, a similarity corresponds to a correlation empirically observed in the learning set. If all the examples that have a conjunction of descriptors D1 in their description, have also the descriptor d2, it's possible to induce that D1 implies d2 in the learning set. The principle of induction used in symbolic learning consists in a generalization of these relations to the whole description space. Thus, these regularities must then be detected. Two functions D and C are used. The first one, D, goes from the descriptor lattice to the example lattice. To each description (i.e., conjunction of descriptors), it associates the set of examples of the learning base covered by this description. The second one, C, relates to each subset of the learning set (i.e., to each element of the example lattice) the set of descriptors present in the description of all the examples. These two functions are represented below: D(E): descriptors comon to all examples of E

Space of descriptors

Space of examples C(d): set of examples covered by description d

Figure 2: The C and D functions Example: Let us imagine a learning set made of three examples E1, E2, E3: E1 = d1 & d2 & d3 & d4 E2 = d1 & d2 & d4 & d5 E3 = d1 & d2 & d3 & d4 & d6 An empirical regularity can be detected between d1 & d2 and d4, in fact, all the examples described by d1 & d2 have also d4 in their description. This is obtained with D and C as follows: DoC(d1 & d2) = D({E1, E2, E3}) = d1 & d2 & d4. The main difficulty consists in the large number of potential regularities sometimes useless. The algorithm to explore the search space goes from general to specific. So, the exploration goes from the bottom of the descriptors' lattice to its top respecting the partial order relation.

8

It is important to note that the CHARADE system makes use of logical constraints to restrict the logical implications it generates, to useful ones. As far as these constraints represent the equivalence between rules, their implementation permits to suppress useless ones. This avoids redundancies and at the same time it limits the exploration of the description space. This paper will generalize the CHARADE approach and will try to draw a systematic parallel between the inputs of the learning component and the structure of the knowledge as it is seen in the KADS methodology. To make it more understandable, the next section will recall some elements related to the structure of Knowledge in the KADS methodology. 4.2 Model of expertise in

the KADS methodology

KADS is today the most known methodology in Europe for the development of knowledge-based systems. Since 1985, there have been several improvements of KADS. Our reference will be the state of the methodology as it is presented in [Wielinga 92]. Among the different models that KADS recommends to build, the model of expertise produced during the analysis phase, is central. It allows to specify the problem solving expertise that the future system has to use to perform the problem-solving tasks. It consists of four layers: the domain layer, the inference layer, the task layer and the strategy layer. In this paper, we are only interested in the first three ones. First, we present briefly what kind of knowledge each layer includes. • The domain layer describes the concepts, the relations and the different properties of the domain. For instance, on the bridge example, the domain layer will include all the concepts' descriptions (possible values, type,...) the relations between these concepts, semantic networks linking them,... The inference and the task layers provide models that describe the problem solving process for a certain task. They are linked to some parts of the domain layer, thus defining the roles and the functions of the static domain knowledge in this process. • In a more precise way, the inference layer is the description of the possible inference steps that are considered as primitive ones at the abstraction level of expertise model. An inference step (or a fragment) consists of a knowledge source that describes the type of inference (ex: match, abstract ...)., and of metaclasses as inputs and outputs of the knowledge source. A metaclass (or a

9

role) describes the role of domain entities in the problem-solving process and it is linked to the domain concepts which fulfil this role. The inference structure network gives a visual representation of dependencies between the different inference steps and is the main vector of communication of a problem solving method for the KADS users (see figure 3 for the inference structure of the bridge example).

Observables 2

Decompose2

Classify

Solution

Syndrome 2

Project

Abstract 2

Convention

Syndrome 1

Select_conv

Abstract 1

solution classes

Observables 1

Decompose1

Source Data

Match

Valuation

Compute

Figure 3: The inference structure of the bridge example.

• In the task layer, it is possible to define composite tasks recursively and the ordering of the tasks (simple or composite) is described in a textual form with selection and iteration operators. This description is called the task structure (see figure 4 for the task structure of the example). This structure gives the way to go through the inference structure.

10

Task { compute (Source-data) -> valuation match (valuation) -> solution-classes If opening = yes1 Then { decompose1 (Source data) -> observables1 abstract1 (observables1) -> syndrome1 select_conv (syndrome1 valuation solution-classes) -> convention If conventional-opening = club Or conventional-opening = no-trump then project (convention) -> solution Else { decompose2 (Source Data) -> observables2 abstract2 (syndrome1 observables2 convention) -> syndrome2 classify (syndrome2 observables2) -> solution } }

Figure 4: The task structure of the bridge example. The different layers are produced as results of the modelling activities. One can identify two principal axes of modelling: the modelling of the domain structures and the modelling of the problem solving processes. KADS is not very prescriptive about the sequences of this process. In the model driven approach, the problem-solving method is chosen in a library or built as soon as possible. Then, this model can be used to help to acquire the domain knowledge because it indicates what kind of knowledge is needed, according to the metaclasses and the knowledge sources of the models. We don't argue any more about the real difficulties of the determination of this model and the necessity of a more iterative process [O'Hara 91]. In the following sections of this paper, we suppose that it is possible to build this problem solving method by a process similar to those of [McDermott 88, Puerta 92, Wielinga 92, Steels 91]. Then, we emphasize how this model can be useful not only to elicit new domain knowledge as in the previously mentioned approaches, but also to learn inductively this knowledge from examples. 5. USE OF KNOWLEDGE TO BIAS THE SEARCH As we have seen in the KADS presentation, the knowledge is structured in four layers. In this part, we will try to draw a parallel between the knowledge acquired in the three former layers of a conceptual model and the inputs and outputs of a learning tool. All the Similarity Based Learning tools learn relations between domain concepts, thus the output of these tools may be considered as a part of the domain layer of a conceptual model. In the same way, the domain bias is also a part of the domain layer. The semantic restrictive type bias, as CHARADE's attribute structuration or the MOBAL's topology of predicates are weaker forms of the inference structure. But it is not the case for all the restrictive type

1 In this version of ENIGME, the tests in the instructions are expressed with domain concepts. It may be

useful (and more in the KADS spirit) that this control may be expressed in more abstract terms.

11

biases. For instance, a constraint avoiding the production of a rule which covers less than a minimum number of examples, has only a meaning for a learning procedure. Hence it is difficult to put it in a conceptual model. In the same way, the universal preference type biases, as the entropic criteria in [Quinlan 86, 90] or the Occam's razzor [Michalski 83] seem to have no place in a conceptual model. In ENIGME, our aim is to generalize the analogy between the learning bias and the knowledge of a conceptual model. Such a parallel has several interests. First, the learning bias will be knowledge which comes from the knowledge acquisition framework, thus, their acquisition may be done by the tools developed in this area. Second, the same models, with the same semantic, will be used by all the knowledge acquisition actors, i.e., the learning tool, the knowledge engineer, the expert and the inference engine. We think that this point is really important because it allows the expert to foretell the effects of a constraint on the produced rule system. Furthermore, as the knowledge acquisition process has to be iterative, the analysis2 of the outputs of the learning tool may provide some useful guidelines to refine the bias (i.e., a part of the acquired knowledge) and so help the expert in the refinement cycle. So, in order to drive the induction phase, we will use the task layer, the inference layer and some parts of the domain layer.

TASK

ENIGME

INFERENCE DOMAIN

Figure 5: KADS and ENIGME 5 . 1 The inference structure The general idea is to learn only relations which follow the primitive inference steps of this structure. Indeed, as it has been asserted before, this structure gives the dependencies between the different concepts involved in a resolution process. Since the purpose of the produced rules is to accomplish such a resolution, the only useful relations are those which link concepts following these dependencies. For instance, once the inference structure (see figure 3) of the bridge example has been acquired, if the apprentice has to learn the relations of the select_conv knowledge source:

2 We do not know yet how much this analysis may be automated. It will be done in further researches.

12

Convention

Select_conv

Syndrome 1

solution classes

Valuation

Figure 6: The select_conv fragment it will learn the relations between concepts involved in the roles: valuation, syndrome1, solution classes and convention. Thus, from the search point of view, the inference structure allows the learner to consider only the parts of the search space whose elements are conjunctions of concepts involved in the input roles of a fragment. Since the dimension of this space is exponential with the number of attributes, if all the concepts are not involved in the same fragment, the gain will be significant. For instance, on the example, the search space is divided by, roughly, 109. Moreover, since it restricts the search space to consider only the rules following the reasoning given by this expert, it reduces the set of all the possible generalizations, thus the role of the preference bias will be strongly limited by this structure. To learn the relations of a knowledge source, the apprentice needs to know which concepts are involved in each role of the corresponding fragment. This knowledge links the inference structure to the domain layer. For instance, on the example, the user must provide the system with: Roles { source data (hand) valuation (honour-points distribution-points) solution-classes (opening) observables1 (distribution, strong-colour) syndrome1 (kind-of-distribution) syndrome2 (kind-of-hand) convention (conventional-opening) solution (opening-colour) observables2 (number-of-spades,hearts,diamonds,clubs strong-colour) }

The first row of each line is the name of the role, and the second row contains the list of all the concepts belonging to this role. ENIGME is a SBL tool, thus, it can only learn relations which associate, empirically, concepts. Hence, it will not be able to learn any kind of procedural knowledge. For

13

instance, it can not learn the formulas which compute the honour points or the distribution points -i.e., the relations of the compute knowledge source (see figure 3). On the other hand, since ENIGME, as CHARADE, doesn't need disjunct classes, it can be used to learn the relations associated to knowledge sources3 if they are of heuristic ones. For example, the tool will be able to learn abstract, select, match, specify,... knowledge sources (see the typology of knowledge sources in [Wielinga 92]). But with only this structure, the apprentice doesn't know the context, in the resolution process, of the fragment. This context is composed of the place of the fragment in the resolution process (i.e., what has be done, what will be done) and the control on this process. In the KADS model, this context is described in the task layer, and its use by ENIGME is the subject of the next part.

5.2

The task structure

This structure is used to describe the chaining of the steps in the resolution process and the control on this chaining. In the KADS formalism, the task structure is described with a small language. For instance, for the bridge example, we have provided the system with the task structure of the figure 4. The language has one ground instruction: the fragment. A fragment matches one step of the problem solving method. For instance, the fragment select_conv (syndrome1 valuation solution-classes) -> convention (see figure 4) corresponds to the step of the resolution which decides if this resolution will use a convention. As it has been asserted before, ENIGME is not able to learn procedural knowledge. Thus, it needs to know if it has to learn the relations of the current fragment. Even if the learner has not to learn these relations, they are a step of the whole resolution process. Hence, ENIGME needs to know them to complete the examples' description with the concepts involved in the output role of the fragment. The other instructions describe the control over the resolution process. This control is mainly expressed through two features: the conditional instructions and the loops. ENIGME uses this control to know what examples match which step of the resolution process.

3 ENIGME will consider that each of these steps can be solved with one inference step, so the

decomposition of the method must be fine enough to fill in this assumption.

14

interpret(Se, current_instruction) ; Se is the set of examples CASE the nature of the current_instruction OF "fragment" : IF the relation are already known THEN complete the examples of Se with the concepts involved in the output role. ELSE apply the learning procedure described in the part on the inference structure. "if expression then instruction1 else instruction2": LET Se1 = Se \ { the set of examples which does not satisfy expression } interpret(Se1, instruction1) LET Se2 = Se \ { the set of examples which satisfy expression } interpret(Se2, intruction2) "while expression do instruction": LET Se = Se \ { the set of examples which does not satisfy expression } interpret(Se, instruction) END_CASE

Figure 7: the algorithm used by ENIGME to interpret the task structure The first consequence of the use of such a structure is on the examples' description. Indeed, they have to represent an experience of the resolution of the problem. They are described with the concepts involved in their resolution process. Thus, their description must include all the concepts involved in all the roles of the inference structure their resolution goes through. Then if these examples come from a data-base, we can have to complete these examples with all the intermediate concepts (like the abstractions,...). On the other hand, if there are several ways to solve the problem, the examples can have different descriptions. For instance, in the bridge example, there are two ways to reach the solution. If the hand is normal, we will follow the left branch of the inference structure and, so, an example of a normal hand will be described with the concepts involved in the roles: hand, syndrome1&2, solution-classes, convention and solution. On the other hand, if the hand is an extraordinary one, we will follow the right branch of the inference structure and, so, the example will be described with the concepts involved in the roles: hand, syndrome1, solution-classes, convention and solution4 . So, for instance, when ENIGME learns the relations of the project fragment, it will only use the examples which meet the condition: opening= yes And (conventional-opening = club Or conventional-opening = no-trump). Another main consequence of the use of task structure is the possible reasoning done by the produced rules. Indeed, in most of the current machine learning tools, this reasoning is, implicitly, done with one inference step which associates directly the solution to the description of the case. In this tool, we give explicitly to the learner the reasoning that the resolution process has to follow. In that way, the produced rules belong to a complete

4 In this example, the second description language is included in the first one, but, it is not necessary.

15

rule system. For instance, with the bridge example, a resolution process may have up to four inference steps. Moreover, since the reasoning could include some loops, a resolution with the rule system may have any number of inference steps. 5 . 3 semantic networks In the KADS methodology, the domain layer may contain several kinds of semantic networks. ENIGME, as CHARADE, uses the generality relations implied by taxonomies (i.e., the link is-a) to structure the search space. But, there are many other kinds of link that the expert may be able to provide the system with. If these networks are easy to give for the expert (or easier than heuristic knowledge and the learner does not produce suitable rules without further constraints), it may be interesting that the learner takes advantages of this knowledge. In this section, we will present a possible use of some kinds of link. We will first present the use the learner will make of such a network. Next, we will explain what kinds of link can be used in that way. A network can be associated with each fragment that the user chooses to acquire with ENIGME. This network links the concepts involved in the input and the output roles. Since the learner could only produce rules linking the concepts involved in the input and output roles with one inference step, this network needs to be flat5 . For instance, on the classify knowledge source of the example, we have provided the system with the following network: opening = club

opening= diamond

kind-of-hand = minor

kind-of-hand = two-minor-colour

opening = heart

kind-of-hand= majorminor

kind-of-hand = two-major-colour

opening = spade

kind-of-hand = major-fifth

Figure 8: The semantic network of the classify fragment The system interprets the network into that way: the concepts of the top level are the events we want to conclude on at this stage of the resolution process (i.e., involved in the output role of the fragment at hand). The concepts of the bottom level are considered as

5 If there are several networks associated with fragments which follow each other, the association of these networks could be considered as a network with as many levels as the number of fragments.

16

possible explanation of these events Furthermore, the system will use an exhaustivity assumption like in MOLE [Eshelman 88]: - If an event, e1, has at least one potential explanation, the premise's part of the produced rules concluding on e1 must include at least one of these potential explanations. So, the produced rules concluding on opening = club will include in their premise's part either kind-of-hand = minor or kind-of-hand = two-minor-colour or kind-of-hand = major-minor. This condition can be specialized with every conjunction of the concepts involved in the input roles. This kind of network can issue from several domain's models according to the description language. For example, if the language described causes and symptoms, a causal model may be interpreted in that way. In that case, interpreting the exhaustivity assumption causally is equivalent to saying that every symptom has a cause. With that use, such a network is a part of the restrictive type bias. It allows to the learner to only consider the possible hypotheses that include, in their premise part, at least one of the concepts of the bottom level. Thus, it fixes some forced points in the reasoning of the produced rule system. For instance, in the example, the reasoning has to conclude on the kind-of-hand and to use this kind-of-hand to reach the solution. 6. E XPERIMENT R ESULTS To learn the first bid of the bridge card game, we have provided ENIGME with and only with the following bias: - Restrictive type bias: the inference, the task structures (see figures 3 and 4) and two semantics networks for the fragments abstract2 and classify (see figure 8 for the one of the fragment classify). - Domain bias: all the concepts' descriptions involved in the resolution, i.e., their name, type and possible values. Furthermore, we provide the system with the procedural knowledge corresponding of the compute, and decompose1&2 knowledge sources (see figure 3). - Preference bias: the Occam's razzor. With this bias and 500 examples, ENIGME produces about one hundred rules in less than one minute of a sun spark station 1. These rules constitute a real rule system. For instance the hand: - spade: A, V, 10, 9, 5, 3. - heart: R, 8, 6. - diamond: A, Q, 2. - Club: 10. is solve by the following reasoning. First, the two facts distribution-points = 16 and honour-points = 14 are computed by the procedural knowledge of the compute

17

fragment. Then, the next inference step is done thanks to the produced rule for the match knowledge source: IF distribution-points ≥ 14 THEN opening = yes, Then, according to the task structure (see figure 4) ENIGME applies the procedural knowledge of the decompose1 fragment to deduce the facts: strong-colour = spade and distribution = 6-3-3-1. Then, the rule (produced for the abstract1 fragment) is fired: IF distribution = 6-3-3-1 THEN kind-of-distribution = two-colours and, since there are no rules which fired to find a convention, ENIGME, according to the task structure, uses the procedural knowledge of the decompose2 fragment to compute the facts: number-of-spades = 6, number-of-hearts = 3, number-of-diamonds = 3, number-of clubs = 1, and fires successively a rule produced for the abstract2 knowledge source: IF kind-of-distribution = two-colours and number-of-hearts ≤ 3 and strong-colour = spade THEN kind-of-hand = major-minor and a rule6 produced for the classify knowledge source: IF kind-of-hand = major-minor and number-of-spades ≥ 6 THEN opening-colour = spade Which is the expected result. Last, but not least, the expert is much more confident with this kind of rule system than with a single rule which associate, directely, the observables to the solution. 7. C ONCLUSION The aim of this paper is to provide some basement for a real integration of Machine Learning and Knowledge Acquisition which could not just be reduced to a juxtaposition. To achieve this integration, our claim was that it was possible to draw some correspondence between those two fields. More precisely, we developed a parallel among the Knowledge Acquisition models and the inputs of Machine Learning tools. It leads to establish that some parts of the learning bias can take place in the structured model of expertise as it is presented in KADS methodology. As a conclusion, we want to draw attention on three points.

6 The produced rules are local to a fragment. Thus, this rule is only valid in the context of the classify

fragment, i.e., when opening = yes and there is no convention.

18

In the first place, the positive result of this work is the achievement of an integrated system named ENIGME which strongly couples Knowledge Acquisition and Machine Learning. It was empirically tested on a non trivial problem which is presented in the paper. We shall pursue its development by inserting it in an existing Knowledge Acquisition environment. In the second place, we have to note that we strongly coupled Knowledge Acquisition and Machine Learning. This strong couple is based on the fact that most of the inputs of the learning algorithms could be interpreted both in terms of their role in Machine Learning and in terms of problem solving models. This result is very significant by itself. Lastly, the positive results we obtained need to be tempered since we did not succeed in assimilating all the learning bias to components of problem solving models: the preference-type bias has no interpretation in terms of Knowledge Acquisition models. Up to now, it only has some sense in terms of Machine Learning algorithms, its main role being to order the generalization space. In the future, we shall examine deeply the exact role of the preference-type bias and its meaning. We shall try to express it as a function of the domain knowledge or, more generally, as part of knowledge. If it is not possible, we shall try to limit its role by providing the learning system with many other constraints, i.e., by increasing the restrictive-type bias.

8. A CKNOWLEGDMENTS We thank the whole machine learning and knowledge acquisition team at the LAFORIA. We have particularly benefited from many discussions with Karine Causse and Bernard Le Roux who, furthermore, has provided us with the bridge learning set. The research reported here was carried out in the course of the VITAL project. This project is partially funded by the ESPRIT II program of the Commission of the European Communities as project number P5365 9.

REFERENCES

[Buntine 90a] [Buntine 90b] [Eshelman 88] [Gaines 89]

[Ganascia 87]

[Ganascia 88]

Wray Buntine Myths and Legends in Learning Classification Rules. AAAI 90. Wray Buntine and D. Stirling Interactive induction. Machine intelligence 12. Larry Eshelman Mole: a Knowledge acquisition tool for cover and differentiate systems. In Automating KA for expert systems ed. by S. Marcus Brian R. Gaines Knowledge acquisition: The continuum linking machine learning and expertise transfer. EKAW 89 Jean-Gabriel Ganascia AGAPE et CHARADE : deux techniques d'apprentissage symbolique appliquées à la construction de bases de connaissances. Thèse d'état, Université d'Orsay Jean-Gabriel Ganascia Improvement and refinement of the learning bias semantic.

19

[Michalski 83]

ECAI 88 Ryszard S. Michalski A theory and methodology of inductive learning. Machine learning: An Artificial Intelligence approach V. 1

[Mitchell 82]

Tom M. Mitchell Generalization as Search. Artificial Intelligence V18 N°2 March 1982 [McDemott 88] John McDermott Preliminary steps toward a taxonomy of problem solving methods. In Automating KA for expert systems ed. by S. Marcus [Morik 91] Katharina Morik and Edgar Sommer Mobal 1.0: user guide. MLT ESPRIT project deliverable GMD/P2154/21/1 [O’Hara 91] Kieron O’Hara and Nigel Shadbolt Generalized Directive Models Supporting the Knowledge Acquisition Process. VITAL internal deliverable NOTT/T212/W/5 [Puerta 92] A.R. Puerta, J.W. Egar, S.W. Tu and M.A. Musen A multiple method knowledge-acquisition shall for the automatic generation of knowledge-acquisition tools. Knowledge Acquisition Volume 4, number 2, june 1992 [Quinlan 86] John R. Quinlan Induction of decision trees Machine learning volume 1, number 1 [Quinlan 90] John R. Quinlan Learning logical definitions from relations Machine learning volume 5, number 3 [Russel 90] Stuart J. Russel and Benjamin N. Grosof Declarative Bias: An Overview. in Change of representation and inductive bias. ed. by P. Benjamin [Shadbolt 90] Nigel Shadbolt and B.J. Wielinga Knowledge based knowledge acquisition: the next generation tools. in current trends in knowledge acquisition ed. by B. Wielinga et al. [Steels 91] Luc Steels COMMET: A componentional methodology for knowledge engineering. ESPRIT project CONSTRUCT. Deliverable WP 2/3/4 (synthesis) [Utgoff 86] Paul E. Utgoff Shift the bias for inductive concept learning. Machine learning: An Artificial Intelligence approach V. 1 [Webb 91] Geoffrey I. Webb Einstein: An interactive inductive knowledge acquisition tool. 6th Banff KA for KBS Workshop, Canada 1991 [Waterman 86] D.A. Waterman A guide to expert systems. Addison Wesley [Wielinga 92] B.J. Wielinga, A.Th. Schreiber and J.A. Breuker KADS: A Modelling Approach to Knowledge Engineering. Knowledge Acquisition volume 4, number 1, march 1992.

20