Dlab: A Declarative Language Bias Formalism - CiteSeerX

1 downloads 0 Views 192KB Size Report
In. Proceedings of the Sixth National Conference on Arti cial Intelligence (AAAI87), pages 505{510, 1987. 19. Leon Sterling and Ehud Shapiro. The art of Prolog.
Dlab: A Declarative Language Bias Formalism Luc Dehaspe and Luc De Raedt Katholieke Universiteit Leuven, Department of Computer Science, Celestijnenlaan 200A, B-3001 Heverlee, Belgium email : Luc.Dehaspe,[email protected] fax : ++ 32 16 32 79 96; telephone : ++ 32 16 32 75 50

Abstract. We describe the principles and functionalities of

D lab (Declarative LAnguage Bias). Dlab can be used in inductive learning systems to de ne syntactically and traverse eciently nite subspaces of rst order clausal logic, be it a set of propositional formulae, association rules, Horn clauses, or full clauses. A Prolog implementation of Dlab is available by ftp access. Keywords: declarative language bias, concept learning, knowledge discovery

1 Introduction

The notion bias, generally circumscribed as \a tendency to show prejudice against one group and favouritism towards another" (Collins Cobuild, 1987), has been adapted to the eld of computational inductive reasoning to become a generic term for \any basis for choosing one generalization over another, other than strict consistency with the instances" (Mitchell [14]). We borrow a more netuned de nition of inductive bias from Utgo [20].

De nition1 (inductive bias). Except for the presented examples and counterexamples of the concept being learned, all factors that in uence hypothesis selection constitute bias. These factors include the following: 1. The language in which hypotheses are described 2. The space of hypotheses that the program can consider 3. The procedures that de ne in what order hypotheses are to be considered 4. The acceptance criteria that de ne whether a search procedure may stop with a given hypothesis or should continue searching for a better choice Utgo 's de nition of bias has further developed into a typology which distinguishes three di erent categories [17]: language bias roughly combines Utgo 's factors 1 and 2, and search bias and validation bias roughly correspond to items 3 and 4 respectively1 . As the factors that in uence hypothesis selection were further charted the idea grew to take them out of the hands of programmers, promote them to parameters in learning systems, and thus make way for the 1

An alternative framework [9] divides bias into representational (cf. items 1 and 2) and procedural (cf. items 2 and 3) components.

speci cation and modi cation of previously unexploited a priori knowledge. For this type of explicit input parameters, Russell and Grosof [18] introduced the concept declarative bias. In this paper, we present a new formalism for the declarative representation of language bias. The formalism is called, somewhat opportunistically, Dlab (Declarative LAnguage Bias). A Dlab grammar intensionally de nes the syntax of a nite subspace of rst order clausal logic2, be it a set of propositional formulae, association rules, Horn clauses, or full clauses. With the design of Dlab we have attempted to balance the following con icting requirements: 1. Ease of use: the formalism should be declarative and have a clear semantics 2. Expressive power: it should allow the full exploitation of prior syntactic knowledge to maximally reduce the search space 3. Ease of navigation: it should suggest a strategy for exploring the search space The latter concern is inspired by the classical machine learning view on induction as a search process through a partially ordered space induced by the generalization relation, cf. [15]. Machine learning systems typically search the space speci c-to-general or general-to-speci c. The features of Dlab make it especially compatible with the latter class of systems. We will introduce a so-called re nement operator for Dlab that calculates the maximally general specializations of any clause in the hypothesis space. We present an overview of Dlab in two stages. First (Section 2), we discuss syntax, semantics, and a re nement operator for Dlab , a subset of Dlab. We then extend Dlab to full Dlab (Section 3). All examples of Dlab at work have been concentrated in Section 4, which we recommend as a reader's refuge. Finally, in Section 5 we relate Dlab to earlier work on declarative language bias. 2

Dlab

A Dlab grammar is a set of templates to which the clauses in the hypothesis space conform. We rst give a recursive syntactic de nition of the Dlab formalism. 2

We assume familiarity with rst order logic (see [12, 8] for an introduction), but brie y review the basic relevant concepts. A rst order alphabet is a set of predicate symbols, constant symbols and functor symbols. A clause is a formula of the form A1 ; : : : ; Am B1 ; : : : ; Bn where the Ai and Bi are logical atoms. An atom p(t1 ; : : : ; tn ) is a predicate symbol p followed by a bracketed n-tuple of terms ti . A term t is a variable V or a function symbol f (t1 ; : : : ; tk ) immediately followed by a bracketed k-tuple of terms ti . Constants are function symbols of arity 0. The above clause can be read as A1 or : : : or Am if B1 and : : : and Bn . All variables in clauses are universally quanti ed, although this is not explicitly written. Extending the usual convention for de nite clauses (where m = 1), we call A1 ; : : : ; Am the head of the clause and B1 ; : : : ; Bn the body of the clause.

De nition2 (Dlab syntax). 1. a Dlab atom is either a logical atom, or of the form Min Max : L, with Min and Max integers such that 0  Min  Max  length(L), and with L a list of Dlab atoms; 2. a Dlab template is of the form A B, where A and B are Dlab atoms; 3. a Dlab grammar is a set of Dlab templates. The hypothesis space that corresponds to a Dlab grammar is then constructed via the (recursive) selection of all subsets of L with length within range Min : : :Max from each Dlab atom Min Max : L. This idea can be elegantly formalised and implemented using the De nite Clause Grammar (DCG) notation, which is an extension of Prolog (cf. [3, 19])3.

De nition3 (Dlab semantics). Let G be a Dlab grammar, then dlab generate(G ) = fdlab dcg(A) dlab dcg(B)j(A B) 2 Gg generates all clauses in the corresponding hypothesis space, where dlab dcg(E) is a list of logical atoms generated by dlab dcg: dlab dcg(E) ?! [E]; fE 6= Min Max : Lg: (1) dlab dcg(Min Max : []) ?! fMin  0g; []: (2) (3) dlab dcg(Min Max : [ jL]) ?! dlab dcg(Min Max : L): dlab dcg(Min Max : [E jL]) ?! fMax > 0g; dlab dcg(E); dlab dcg((Min ? 1) (Max ? 1) : L): (4) From the semantics of a Dlab grammar we derive a formula for calculating the size of its hypothesis space.

De nition4 (dlab size). Let G = fA1

grammar.

B1 ; : : :; Am

Bm g be a Dlab

P

dlab size(G ) = mi=1(ds(Ai )  ds(Bi )) ; ds(E) = 1; where E is a logical atom P ; e (ds(L ); : : :; ds(L )) ; ds(Min Max : [L1 ; : : :; Ln ]) = Max 1 n k=Min k e0 (L) = 1 ; Q en (s1 ; : : :; sn ) = ni=1 si ; ek (s1 ; s2 ; : : :; sn) = ek (s2 ; : : :; sn ) + s1  ek?1(s2 ; : : :; sn ), with k < n : Proof. The rst rule states that the size of the language de ned by a Dlab

grammar equals the sum of the sizes of the languages de ned by its individual

Dlab templates. The latter size can be found by multiplying the number of 3

To simplify our de nition of a generation function we here introduce (and will continue to use) a special list notation in which the head and the body of clauses are written as lists: [A1; : : : ; Am ] [B1 ; : : : ; Bn ].

headlists and the number of bodylists covered by the head and body Dlab atoms. A Dlab atom which is not of the form Min Max : L has a coverage of exactly one, as is expressed in the second rule. Some more intricate combinatorics underlies the third rule. Basically, we select k objects fL1; : : :; Lng, for each k in range Min : : :Max, hence the P from. Inside summation Max this summation we would have the standard formula k=Min n!=k!  (n ? k)! if our case had been an instance of the prototypical problem of nding all combinations, without replacement, of k marbles out of an urn with n marbles. This formula does not apply due to the fact that we rather have n urns (fL1 ; : : :; Lng) with one or more marbles (ds(Li )  1), and only combinations that use at most one marble from each urn should be counted. Therefore we need ek (s1 ; : : :; sn ), where ek is the elementary symmetric function [13] of degree k and the si are the numbers of marbles in each urn. The rst base case of this recursive function accounts for the fact that there is only one way to select 0 objects. In the second base case, where k = n, one has to take an object from each urn. As for each urn there are si choices, the number of combinations equals the product of all si . The nal recursive case applies if k < n. It is an addition of two terms, one for each possible operation on urn 1 (represented by s1 ). Either we skip this urn, and then we still have to select k elements from urns 2 to n. The number of such combinations is given by ek (s2 ; : : :; sn ). Or else we do take a marble from the rst urn. We then have to multiply s1 , the choices for the rst urn, with ek?1 (s2 ; : : :; sn), the number of k ? 1 order combinations of elements from urns 2 to n. 2 A re nement operator for Dlab is based on the observation that all clauses c in the hypothesis space are de ned by a sequence of subset selections, in casu a string of bits, which we will call the Dlab path dp(c), where each 0 and 1 mark the application of dlab dcg Rules 3 and 4 respectively4 . If we enlarge one of the subsets, then the clause c0  c de ned by the new sequence is a specialization of c under -subsumption. Every 0 in the Dlab path dp(c) marks an occasion for extending c in the sense that it points at a Dlab atom E which has been skipped during generation of c. We only have to switch this bit to 1 to include the corresponding E during generation of supersets of c. If we somehow enlarge one subset in a minimal way, then c0 will be a re nement, i.e. a maximally general specialization of c5. In terms of the Dlab path: we have to expand exactly one 0 bit, and then only in a minimal way. Additional constraints on the enlargement of subsets, such that each subset is generated at most once, yield a re nement operator that is optimal, in the sense that it will The Dlab path is not only a basis for the re nement of clauses but also for their storage in a compressed bitstring format. This feature makes Dlab especially valuable as an encoding tool in learning systems that manipulate large queues (e. g. doing a best rst search) of candidate clauses which in uncompressed format would quickly exhaust memory resources. 5 Depending on the Dlab grammar, this re nement (under -subsumption) can be proper or not. 4

generate a node (= clause) in the search space at most once. When using the Dlab path we can achieve optimality for instance if we never expand 0's to the left of already expanded 0's. As fully described in [6], the Dlab re nement operator sketched above can be implemented in seventeen DCG rules. A more sophisticated Prolog implementation (as well as [6]) is available by anonymous ftp access to ftp:cs:kuleuven:ac:be. The relevant directory is pub=logic ? prgm=ilp=dlab. 3

Dlab Extended: Dlab

In an extended version Dlab mainly two features have been added to improve readability of more complex grammars: second order variables, and subsets on the term level.

De nition5 (Dlab syntax). 1. a Dlab term is either

(a) a variable symbol, or (b) of the form f(t1 ; : : :; tn), where f is a function symbol followed by a bracketed n ? tuple ((0  n)) of Dlab terms ti , or (c) of the form Min Max : L, where Min and Max are integers with 0  Min  Max  length(L), and with L a list of Dlab terms; 2. a Dlab atom is either (a) of the form p(t1 ; : : :; tn), where p is a predicate symbol followed by a bracketed n ? tuple ((0  n)) of Dlab terms ti , or (b) of the form Min Max : L, where Min and Max are integers with 0  Min  Max  length(L), and with L a list of Dlab atoms; 3. a Dlab template is of the form A B, where A and B are Dlab atoms; 4. a Dlab variable is of the form dlab var(p0 ; Min Max; [p1; : : :; pn]), where Min and Max are integers with 0  Min  Max  n, and with pi a predicate symbol or a function symbol 5. a Dlab grammar is a couple (T ; V ), where T is a set of Dlab templates, and V a set of Dlab variables. We will now de ne the conversion of Dlab grammars (T ; V ) to the Dlab format such that the above de nitions of semantics, size, and a re nement operator remain valid for the enriched formalism. First, to remove the second order variables V we recursively replace all Dlab terms and atoms p(t1 ; : : :; tn) in T such that dlab var(p; Min Max; [p1; : : :; pm ]) 2 V ; with Min Max : [p1(t1 ; : : :; tn); : : :; pm(t1 ; : : :; tn)] : Next we recursively remove subsets on the termlevel by replacing from left to right all Dlab terms p(t1 ; : : :; ti; Min Max : [L1 ; : : :; Ln ]; ti+2; : : :; tm ); with Min Max : [p(t1; : : :; ti ; L1; ti+2; : : :; tm ); : : :; p(t1; : : :; ti; Ln ; ti+2; : : :; tm )] : When applied subsequently, these two algorithms transform a Dlab grammar G = (T ; V ) into (G 0 ; ;), where G 0 is an equivalent Dlab grammar.

4

Dlab at Work

This section is entirely devoted to the illustration of the expressive power of Dlab. We begin with some elementary grammars and end with a study of Dlab at work in the nite element mesh design domain. Given a Dlab atom MinMax : L, four choices of values for Min and Max determine the following cases of special interest: 1. all subsets: Min = 0; Max = len e. g. G 1 = (fh 0 len : [a; b; c]g; ;) 2. all non-empty subsets: Min = 1; Max = 1 e. g. G 2 = (fh 1 len : [a; b; c]g; ;) 3. exclusive or: Min = 1; Max = 1 e. g. G 3 = (fh 1 1 : [a; b; c]g; ;) 4. combined occurence: Min = Max = len e. g. G 4 = (fh len len : [a; b; c]g; ;) These special cases can be nested to construct more complex grammars exempli ed below. Table 1 gives the corresponding hypothesis spaces for grammars G 1 ?G 8. A p in the column of grammar G i marks the clauses of the rst column that are in the corresponding hypothesis space.

G 5 = (fh G 6 = (fh G 7 = (fh G 8 = (fh

1 len : [a; 1 1 : [b; c]]g; ;) 1 len : [a; len len : [b; c]]g; ;) len len : [a; 1 1 : [b; c]]g; ;) 0len : [lenlen : [a; 0len : [lenlen : [b; 0 ? len : [c]]]]]g; ;)

Table 1. The semantics of some sample [h] [h] [h] [h] [h] [h] [h] [h]

[] [a] [b] [c] [a; b] [a; c] [b; c] [a; b; c]

D

lab grammars

1 2 G 3 G 4 G 5 G 6 G 7 Gp8

G G p p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p

p p

p

p

Grammar G 8 illustrates how taxonomies can be encoded, such that each atomic formula necessarily co-occurs with all its ancestors and never combines with other nodes. A more elaborate example of how Dlab handles this type of background knowledge is grammar G 9, which encodes the taxonomy for suits of playing cards (see Figure 1).

G T

9 = (T 9; V 9) 9 = fok(C )

len len : [card(C ); 0 1 : [len len : [red(C );p1(C )]; len len : [black(C ); p2(C )] 







] ]g V 9 = fdlab var (p1; 0 1; [hearts; diamonds]); dlab var(p2; 0 1; [clubs; spades])g

dlab generate(

G

8[ok(C )] >>[ok(C )] >>[ok(C )] >:[ok(C )] [ok(C )]

[card(C )] [card(C ); red(C )] [card(C ); red(C ); hearts(C )] [card(C ); red(C ); diamonds(C )] [card(C ); black(C )] [card(C ); black(C ); clubs(C )] [card(C ); black(C ); spades(C )]

Fig. 1. Encoding taxonomies:

D

lab grammar G 9

We now step through the development of a Dlab grammar for the nite element mesh design application frequently used to evaluate relational learning algorithms (see e. g. [7, 11]). The training examples in this domain are \handconstructed" approximations of physical structures. Such an approximation is a set of nite elements called a mesh model, divided into a collection of edges. The relation to be learned is mesh(E; N), where E identi es the edge and N is the recommended number of nite elements along this edge. The background knowledge contains nineteen unary predicates describing individual edges, and three binary predicates describing topological relations between edges belonging to the same structure. G 10 is a rst Dlab grammar for discovering rules in which only the unary descriptions are taken into account (see Figure 2). 10 = (T 10; V 10) 10 = f1 1 : [mesh(E; 1); mesh(E; 2); mesh(E; 3); mesh(E; 4); mesh(E; 5); mesh(E; 6); mesh(E; 7); mesh(E; 8); mesh(E; 9); mesh(E; 10); mesh(E; 11); mesh(E; 12); mesh(E; 17)]

G T

0 len : [long(E ); usual(E ); short(E ); circuit(E ); half circuit(E ); quarter circuit(E ); short for hole(E ); long for hole(E ); circuit hole(E ); half circuit hole(E ); not important(E ); free(E ); one side fixed(E );two side fixed(E );fixed(E ); not loaded(E ); one side loaded(E ); two side loaded(E ); cont loaded(E )]g V 10 = ; dlab size(G 10) = 6:82  106

Fig. 2. Finite element mesh design:

D

lab grammar G 10

Grammar G 10 generates many a priori invalid or uninteresting clauses such as mesh(E; 1) long(E); short(E). In fact three groups of mutually exclusive attributes can be constructed: edge types, boundary conditions, and loading. In G 11 we make use of this additional knowledge to reduce the search space significantly. We also use subsets on the termlevel and predicate variables to improve readability (see Figure 3). 11 = (T 11; V 11) 11 = fmesh(E; resolution) 1 len : [type(E ); boundary(E ); loading(E )] V 11 = fdlab var (resolution; 1 1; [1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 17]); dlab var(type; 1 1; [long; usual; short; circuit; half circuit;

G T

quarter circuit; short for hole;long for hole; circuit hole; half circuit hole; not important]); dlab var(boundary; 1 1; [free; one side fixed;two side fixed;fixed]); dlab var(loading; 1 1; [not loaded; one side loaded; two side loaded; cont loaded]) dlab size( 11) = 3887 



g

G

Fig. 3. Finite element mesh design:

D

lab grammar G 11

With grammar G 12 (see Figure 4) the hypothesis space is again extended to include clauses with a second edge in the antecedent. To describe this second edge the former three groups of nineteen attributes can be used, but also the predicate mesh(E2; N), and a binary predicate specifying the topological relation between both edges. G

12 = (T 12; V 12) 12 = fmesh(E; resolution)

1 len : [type(E ); boundary(E ); loading(E ); type(E 2); boundary(E 2); loading(E 2); mesh(E 2; resolution); topology(E; E 2)]g V 12 = V 11 [ fdlab var (topology; 1 1; [opposite;neighbour; equal )g dlab size(G 12) = 4:91  107 T

Fig. 4. Finite element mesh design:

D

lab grammar G 12

Finally, domain experts may complain G 12 generates rules that are not useful from a practical point of view. Suppose they object to clauses that violate one of the following conditions. Firstly, each antecedent should contain at least one attribute description of the edges that occur in the rule (excludes e. g. mesh(E; 1) short(E2) and mesh(E; 1) long(E); opposite(E; E2)). Secondly, if edge E2 occurs, then the rule should specify its topological relation with E (excludes e. g. mesh(E; 1) long(E); short(E2)) . These constraints can be formulated in the Dlab formalism as shown in our nal example grammar G 13 (see Figure 5).

13 = (T 13; V 12) 13 = fmesh(E; resolution) len len : [1 len : [type(E ); boundary(E ); loading(E )]; 0 len : [len len : [topology(E; E 2); 1 len : [mesh(E 2;resolution); type(E 2); boundary(E 2); loading(E 2) ] ] ] ]g dlab size(G 13) = 3:26  107 G T

Fig. 5. Finite element mesh design:

D

lab grammar G 13

5 Conclusion

We have described and illustrated the principles and functionalities of Dlab a new formalism for the declarative representation of language bias. We conclude with a brief situation of Dlab against some alternative declarative6 language bias formalisms. Closest to Dlab are the clausemodels proposed in [1]. Generally speaking, clausemodels are special cases of Dlab templates in which nesting and the choice of Min and Max is restricted. As discussed in [1], schemata [10] and predicate sets [2] as used in MOBAL and the FILP system respectively, are special cases of clausemodels, and thus indirectly of Dlab templates. An antecedent description grammar, as used by Cohen in GRENDEL [4], is in essence a de nite clause grammar that generates the antecedents of clauses in the hypothesis space. In general a conversion of antecedent description grammars to Dlab is not always possible7 . Roughly speaking, Dlab contains a hardwired antecedent description grammar dlab dcg that takes the Dlab grammar as its single argument.

Acknowledgements This work is part of the Esprit Basic Research projects no. 6020 and 20237 on Inductive Logic Programming. Luc De Raedt is supported by the Belgian National Fund for Scienti c Research. The authors would like to thank Wim Van Laer and Timothy Chow for their elegant solutions of the dlab size combinatorics problem, and Hendrik Blockeel for his helpful comments on this paper. More procedural approaches to syntactic bias speci cations use parameters such as the maximal variable depth or term level to control the complexity of the concept language, cf. [5, 16]. Parametrized languages should be considered complementary to Dlab, in the sense that the same parameters trivially de ne (a series of) Dlab grammars. 7 A clear case where this conversion is impossible occurs when the antecedent description grammar generates an in nite language.

6

References 1. H. Ade, L. De Raedt, and M. Bruynooghe. Declarative Bias for Speci c-toGeneral ILP Systems. Machine Learning, 20(1/2):119 { 154, 1995. 2. F. Bergadano and D. Gunetti. An interactive system to learn functional logic programs. In Proceedings of the 13th International Joint Conference on Arti cial Intelligence, pages 1044{1049. Morgan Kaufmann, 1993. 3. W.F. Clocksin and C.S. Mellish. Programming in Prolog. Springer-Verlag, Berlin, 1981. 4. W.W. Cohen. Grammatically biased learning: learning logic programs using an explicit antecedent description language. Arti cial Intelligence, 68:303{366, 1994. 5. L. De Raedt. Interactive Theory Revision: an Inductive Logic Programming Approach. Academic Press, 1992. 6. L. Dehaspe and L. De Raedt. DLAB: a declarative language bias for concept learning and knowledge discovery engines. Technical Report CW-214, Department of Computer Science, Katholieke Universiteit Leuven, October 1995. 7. B. Dolsak, I. Bratko, and A. Jezernik. Finite element mesh design: An engineering domain for ilp application. In S. Wrobel, editor, Proceedings of the 4th International Workshop on Inductive Logic Programming, volume 237 of GMD-Studien, Sankt Augustin, Germany, 1994. Gesellschaft fur Mathematik und Datenverarbeitung MBH. 8. M. Genesereth and N. Nilsson. Logical foundations of arti cial intelligence. Morgan Kaufmann, 1987. 9. D. Gordon and M. desJardins. Evaluation and selection of biases in machine learning. Machine Learning, 20(1/2):5{22, 1995. 10. J-U. Kietz and S. Wrobel. Controlling the complexity of learning in logic through syntactic and task-oriented models. In S. Muggleton, editor, Inductive logic programming, pages 335{359. Academic Press, 1992. 11. N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood, 1994. 12. J.W. Lloyd. Foundations of logic programming. Springer-Verlag, 2nd edition, 1987. 13. I.G. MacDonald. Symmetric functions and Hall polynomials. Clarendon Oxford, 1979. 14. T.M. Mitchell. The need for biases in learning generalizations. Technical Report CBM-TR-117, Department of Computer Science, Rutgers University, 1980. 15. T.M. Mitchell. Generalization as search. Arti cial Intelligence, 18:203{226, 1982. 16. S. Muggleton and C. Feng. Ecient induction of logic programs. In Proceedings of the 1st conference on algorithmic learning theory, pages 368{381. Ohmsma, Tokyo, Japan, 1990. 17. C. Nedellec, H. Ade, and B. Bergadano, F. a nd Tausend. Declarative bias in ILP. In L. De Raedt, editor, Advances in Inductive Logic Programming, volume 32 of Frontiers in Arti cial Intelligence and Applica tions, pages 82{103. IOS Press, 1996. 18. S. Russell and B. Grosof. A Declarative Approach to Bias in Concept Learning. In Proceedings of the Sixth National Conference on Arti cial Intelligence (AAAI87), pages 505{510, 1987. 19. Leon Sterling and Ehud Shapiro. The art of Prolog. The MIT Press, 1986. 20. P.E. Utgo . Shift of bias for inductive concept-learning. In R.S Michalski, J.G. Carbonell, and T.M. Mitchell, editors, Machine Learning: an arti cial intelligence approach, pages 107{148. Morgan Kaufmann, 1986.

This article was processed using the LaTEX macro package with LLNCS style