A Hybrid Abductive Inductive Proof Procedure - Semantic Scholar

A Hybrid Abductive Inductive Proof Procedure OLIVER RAY, KRYSIA BRODA, ALESSANDRA RUSSO, Department of Computing, Imperial College London, 180 Queen’s Gate, South Kensington Campus, SW7 2AZ,UK. E-mail: {or,kb,ar3}@doc.ic.ac.uk Abstract This paper introduces a proof procedure that integrates Abductive Logic Programming (ALP) and Inductive Logic Programming (ILP) to automate the learning of first order Horn clause theories from examples and background knowledge. The work builds upon a recent approach called Hybrid Abductive Inductive Learning (HAIL) by showing how language bias can be practically and usefully incorporated into the learning process. A proof procedure for HAIL is proposed that utilises a set of user specified mode declarations to learn hypotheses that satisfy a given language bias. A semantics is presented that accurately characterises the intended hypothesis space and includes the hypotheses derivable by the proof procedure. An implementation is described that combines an extension of the Kakas-Mancarella ALP procedure within an ILP procedure that generalises the Progol system of Muggleton. The explicit integration of abduction and induction is shown to allow the derivation of multiple clause hypotheses in response to a single seed example and to enable the inference of missing type information in a way not previously possible. Keywords: Abductive Logic Programming, Inductive Logic Programming, Machine Learning.

1

Introduction

Inductive Logic Programming (ILP) [17] is the sub-field of Machine Learning (ML) [14] that formalises and automates the task of learning first order theories from examples and prior background knowledge. The benefits of relational learning methods that use expressive logical representations are well illustrated by the ILP system Progol [15], which has been successfully deployed in several applications, most notably in the area of computational bioinformatics [1, 11, 27]. Underlying Progol is an inference method known as Bottom Generalisation (BG) [15, 29] that is used to construct hypotheses incrementally one clause at a time. Given a background theory, B, and a single seed example, e, this inference method generalises a clause, called a Bottom Clause [15], to return a hypothesis, h, that logically entails e relative to B. The role played by the Bottom Clause is of great theoretical and practical importance as it serves to bound a hypothesis space that would otherwise be intractable to search. The success of Progol is to a large extent due to the efficient use that is made of both language bias and search bias during the construction and generalisation of the Bottom Clause. In ML, language bias is the name given to any syntactic constraints on hypothesised clauses, while search bias refers to any procedural preferences that result in certain hypotheses being preferred over others [14, 18]. Progol’s language bias is determined by a set of so-called mode declarations [15] that allow the user to focus the search on interesting and potentially useful subsets of the hypothesis

L. J. of the IGPL, Vol. 0 No. 0, pp. 1–27 0000

1

c Oxford University Press

2 A Hybrid Abductive Inductive Proof Procedure space. Progol’s search bias comprises a heuristic called compression [15] that favours hypotheses containing few literals and covering many examples. Informally, compression embodies the philosophical and scientific principle of Occam’s Razor [14], which advocates preferring the ‘simplest’ hypothesis that fits the data. Despite its success, Progol suffers from a serious limitation that it inherits from the underlying inference method of BG. Namely, by definition it can only hypothesise a single clause in response to a given example. Moreover, those clauses that can be inferred by BG are drawn from a restricted hypothesis space [29] that is characterised by Plotkin’s notion of relative subsumption [19]. Although complete methods of hypothesis finding have been proposed [7, 31] that do not suffer from the theoretical limitations of BG, these methods have yet to achieve a practical significance comparable to BG. Consequently, there is still a real need for improved semantics and proof procedures that build upon the strengths of practically proven ILP systems. This research aims to develop a new approach that extends the core principles of Progol and BG in order to enlarge the class of problems soluble in practice. The semantics of Kernel Set Subsumption (KSS) [23] is a recent generalisation of BG that overcomes the limitation described above. Whereas BG uses a Bottom Clause to bound its search space, KSS uses a set of clauses called a Kernel Set. This allows KSS to derive multiple clause hypotheses outside the semantics of BG. So far, a generic methodology for KSS has been sketched in [23], called Hybrid Abductive Inductive Learning (HAIL), that combines abduction and induction in a cycle of learning that generalises Progol. This cycle is based on two key insights: the first is that the head atoms of a Kernel Set can be computed by Abductive Logic Programming (ALP) [9], which constructs ground explanations for given observations; and the second is that the abduced atoms can be turned into inductive hypotheses using a generalisation of the techniques developed for Progol. The fact that KSS enlarges considerably the hypothesis searched by BG means that HAIL is able to overcome several limitations of Progol. Indeed, by worked examples, it was shown in [23] how HAIL can construct hypotheses that are not computable by Progol. Of course, any practical implementation of HAIL must also make use of language and search bias at least to the same extent as existing systems such as Progol. Intuitively, because Kernel Sets can be seen as generalising Bottom Clauses from single to multiple clauses, a plausible approach would be to generalise the techniques used so successfully by Progol for incorporating language and search bias within the construction and generalisation of Bottom Clauses. A major aim of this work is to show how mode declarations can be incorporated semantically and procedurally into the HAIL approach. This paper presents a proof procedure for HAIL that computes inductive hypotheses drawn from a hypothesis space specified by the user with a set of mode declarations. The hypothesis space of this procedure is characterised by refining the semantics of KSS in a way that accurately reflects the intended meaning of the mode declarations. A prototype HAIL implementation is described that extends the well known ALP procedure of Kakas and Mancarella [10] and integrates it within a generalisation of Muggleton’s widely applied ILP approach Progol [15]. The explicit integration of abduction and induction is shown to allow the derivation of multiple clause hypotheses in response to a single seed example, and to enable the inference of type information in a way not previously possible.

A Hybrid Abductive Inductive Proof Procedure 3 The paper is structured as follows. Section 2 summarises the key notation and terminology and reviews the relevant background material. Section 3 provides a detailed treatment of mode declarations and presents the semantics of Mode-KSS. Section 4 defines and illustrates the HAIL proof procedure, proves its soundness and completeness with respect to Mode-KSS, and describes a prototype implementation. Section 5 compares this approach with related work: in particular with the practically proven system Progol [15] and the theoretically complete approach of CF-Induction [7]. The paper concludes with a summary and some directions for future work.

2

Background

This section summarises the basic notation and terminology and recalls the relevant background material. The tasks of ILP and ALP are reviewed together with the appropriate forms of language and search bias. In particular, the preference criteria of compression is defined, and the standard view of mode declarations is discussed. Finally, the general HAIL approach is described and the underlying semantics of KSS is defined. A running example is used to illustrate the main ideas.

2.1 Notation and Terminology A familiarity with classical first order logic [3] and logic programming [12] is assumed throughout this paper. Formulae will be written in a first order language, L, with quantifiers ∀, ∃, connectives ∧, ∨, ¬, ←, logical constants >, ⊥, and variables x, y, . . .. As usual, a term, t, is a constant or an n-ary function, f , followed by an n-tuple of terms. An atom, a, is a proposition or an n-ary predicate, p, followed by an n-tuple of terms. A positive literal is an atom, a. A negative literal is a negated atom, written ¬a or ← a. A Horn clause is either a rule, a0 ← a1 , . . . , an , a fact, a0 , a denial, ← a1 , . . . , an , or the empty clause, . The atoms a0 and a1 , . . . , an are called head and body atoms, respectively. A definite clause is a rule or a fact. A negative clause is a denial or the empty clause. A Horn clause a0 ← a1 , . . . , an is often treated as a set of literals {a0 , ¬a1 , . . . , ¬an } and is read logically as the formula ∀(a0 ← a1 ∧· · · ∧an ). A Horn theory, T , is a set of Horn clauses {C1 , . . . , Cn } and is read logically as the formula C1 ∧ · · · ∧ Cn . A clause C is said to θ-subsume a clause D, written C < D, whenever Cθ ⊆ D for some substitution θ. A theory S is said to θ-subsume a theory T , written S w T , whenever each clause in T is θ-subsumed by at least one clause in S. The symbol |= denotes classical logical entailment. Thus, if E and F are formulae, then E |= F means that F is satisfied in every model of E. The symbol |=∗ denotes satisfaction in the Least Herbrand Model (LHM) [12]. Thus, E |=∗ F means that F is satisfied in the LHM of E.

2.2 Inductive Logic Programming (ILP) Inductive Logic Programming (ILP) [17] is concerned with the task of generalising positive and negative examples, E + and E − , with respect to prior background knowledge, B. Formally, given three theories B, E + and E − , the task of ILP is to find a theory, H, called a hypothesis, such that B ∪ H |= E + and B ∪ H ∪ E − 6|= . The induced hypothesis H is said to cover the positive examples E + and to be consistent

4 A Hybrid Abductive Inductive Proof Procedure with the negative examples E − . The theory H is said to inductively generalise E + and E − with respect to B. For computational purposes, the theories B, E + , E − and H are usually restricted to some predefined subsets of clausal form logic. This paper will employ a standard ILP setting in which B is a Horn theory, H is a definite theory, and E + and E − are sets of positive and negative ground literals, respectively. This setting is illustrated in Example 2.1 below. Example 2.1 Let B, E + , and E − denote the three theories shown below. The background knowledge contains one rule, which states that x and y are brothers if they have the same father, and four facts, which state that cliff is the father of olley and barney, and that sam is the father of cliff and pete. The two positive examples state pete is the uncle of olley and barney, and the two negative examples state that pete is not the uncle of sam or cliff .   brother(x, y) ← f ather(w, x), f ather(w, y)           f ather(cliff, olley) f ather(cliff, barney) B =         f ather(sam, cliff )   f ather(sam, pete) uncle(pete, olley) + E = uncle(pete, barney) E−

=

← uncle(pete, sam) ← uncle(pete, cliff )

Now, consider the two theories H and H 0 below, which give two simple definitions of the uncle predicate. Then H and H 0 are inductive generalisations of B with respect to E + and E − . This is because, together with B, both hypotheses cover the two positive examples, (i.e. both entail uncle(pete, olley) and uncle(pete, barney)), and both hypotheses are consistent with the negative examples (i.e. neither entails uncle(pete, sam) or uncle(pete, cliff )). uncle(x, y) ← brother(x, z), f ather(z, y) H = uncle(x, y) ← f ather(z, y), f ather(w, x), f ather(w, z) H0 =

In most ILP problems, the search space is often so large that it is necessary to restrict the hypotheses by imposing upon them additional syntactic constraints. In Machine Learning (ML) [14] parlance, these constraints are called language bias. This paper employs a well known form of language bias, called mode declarations, which provides a convenient mechanism for specifying such constraints. In particular, every set of mode declarations, M , defines a hypothesis space, LM , called the language of M , within which hypothesised clauses are required to fall. A formal account of mode declarations and of the language they define is provided in the next subsection, where the standard definitions are recalled, and is further elaborated in Section 3, where the standard view is refined. First, the task of ILP is made precise in Definition 2.2, which formalises the notions of inductive context and inductive generalisation.

A Hybrid Abductive Inductive Proof Procedure 5 Definition 2.2 (Inductive Context, Inductive Generalisation) An inductive context, X, is a four-tuple hB, E + , E − , M i where B is a Horn theory, E + is a set of positive ground literals, E − is a set of negative ground literals, and M is a set of mode-declarations, such that (i) B 6|= E + and (ii) B ∪ E + ∪ E − 6|= . Now, let LM denote the language of M . Then an inductive generalisation of X is a definite theory H ⊆ LM such that (i’) B ∪ H |= E + and (ii’) B ∪ H ∪ E − 6|= . As formalised above, an inductive context simply contains the four inputs of the ILP task. In addition to the background knowledge and positive and negative examples described earlier, a set of mode declarations is also included among the inputs. Two requirements are also stated. First, B should not entail E + (as otherwise there would be no need to learn any hypothesis). Second, B should be consistent with E + and E − (or else there could be no possibility of learning a hypothesis). The output of the ILP task is called an inductive generalisation. In addition to the coverage and consistency conditions discussed above, it is also required that the hypothesis fall within the language of M . Note that conditions (i), (ii), (i’) and (ii’) are sometimes referred to as prior necessity, prior satisfiability, posterior sufficiency and posterior satisfiability, respectively [17]. In order to select between competing hypotheses it is necessary, in practice, to introduce additional preference criteria called search bias. This paper adopts a widely used form of search bias, called compression, that attempts to minimise the number of literals in the hypothesised clauses. Definition 2.3 gives a simple formulation of compression applicable in the setting above where hypotheses are consistent with all of the negative examples. The complexity of a hypothesis is simply the number of literals it contains, the coverage is the number of positive examples it entails, and its compression is the difference between the two. It should be noted, however, that in order to allow for noisy or misclassified examples, some approaches do not strictly enforce the consistency of H, but use a compression metric modified as in [16] by subtracting the number of negative examples violated by H. Definition 2.3 (Complexity, Coverage, Compression) Let B be a Horn theory, E + a set of ground atoms, and H = {h1 , . . . , hn } a definite theory. Then the complexity of H is the integer x = Σni=1 |hi |, the coverage of H (with respect to B and E + ) is the integer y = |{e ∈ E + | B ∪ H |= e}|, and the compression of H (with respect to B and E + ) is the integer z = y − x. Example 2.4 Let B, E + , E − , H and H 0 be the theories defined in Example 2.1 above. Hypothesis H has complexity 3 as it contains three literals, coverage 2 as it covers two positive examples, and hence compression −1. Similarly, hypothesis H 0 has complexity 4, coverage 2, and compression −2.

2.3 Mode Declarations This subsection reviews and illustrates the notion of mode declarations. To begin, the accepted view of mode declarations, which is recalled from [15], is formalised in Definitions 2.5 and 2.6 and explained below. In brief, every set, M , of mode declarations is composed of two subsets, M + and M − , called head and body declarations, respectively. Each head and body declaration consists of a scheme and a recall. The

6 A Hybrid Abductive Inductive Proof Procedure recall is either a ‘∗’ or an integer, and the scheme is a ground atom that may contain special terms called placemarkers. Intuitively, a scheme can be thought of as a ‘template’ with placemarkers as its ‘slots’. Every scheme is identified with a set of atoms, called instances, which are obtained by replacing placemarkers with ordinary (i.e. non-placemarker) terms subject to a few simple conditions determined by the mode of each placemarker. As formalised in Definition 2.5, the mode of a placemarker is either input (+), output (−), or ground (#). Intuitively, input and output placemarkers ‘stand for’ variables, while ground placemarkers ‘stand for’ ground terms. The distinction between inputs and outputs is formalised in Definition 2.6, where the notion of instance is extended from single mode declarations, m, to sets of mode declarations, M . Specifically, an instance of M is a definite clause C whose head atom is an instance of a head declaration, and whose body atoms are instances of some body declarations. The only other requirement is that each variable replacing a +placemarker must either replace a +placemarker in the head or a −placemarker in some previous body atom. Informally, this means that each input variable is ‘linked’ to an input variable in the head through some ‘chain’ of intervening body atoms. Definition 2.5 (Mode declaration) A mode declaration, m, is either a head declaration, written modeh(r , s), or a body declaration, written modeb(r , s), where r is a positive integer called the recall of m, and s is a scheme. A scheme is a ground atom, possibly containing placemarkers. A placemarker, k, is a unary predicate, p, called the type of k, preceded by a mode. A mode is one of the three symbols ‘+’, ‘−’, or ‘#’ (called input, output, and ground modes, respectively). A predicate ‘any’ is assumed that is always true, and the symbol ‘∗’ is used to denote an arbitrary recall. Let M be a set of mode declarations. Then M + and M − denote, respectively, the sets of head and body declarations in M . Definition 2.6 (Mode Language) Let m be a mode declaration with scheme s. Then an instance of m is any atom obtained from s by replacing all # placemarkers by ground terms, and all + and − placemarkers by variables. Each occurrence of a variable v that replaces a + (resp. −) placemarker is called an input (output) occurrence of v. Let M be a set of mode declarations. Then an instance of M is any definite clause a0 ← a1 , . . . , an for which there is a head declaration m0 ∈ M + and body declarations m1 , . . . , mn ∈ M − such that ai is an instance of mi for all 0 ≤ i ≤ n, and every variable v with an input occurrence in ai has an input occurrence in a0 or an output occurrence in aj for some 0 < j < i. The set of all instances of M is denoted LM and called the language of M . The concepts formalised above are now illustrated in Example 2.7 below, which shows how mode declarations provide a convenient and intuitive mechanism for stating which predicates may appear in the head and body atoms of hypothesised clauses, for determining the position of variables in those atoms, and for constraining their relative order. As shown in the example, the ordering constraints require that all input variables must be ‘available’ either as inputs in the head or as outputs in previous body atoms. This mirrors the traditional use of mode declarations in logic programming, in which input variables identify which arguments of a predicate must be instantiated before that predicate is called, and output variables denote which arguments of a predicate will become instantiated after that predicate is called.

A Hybrid Abductive Inductive Proof Procedure 7 Example 2.7 Let M denote the set of mode declarations shown below, and let H and H 0 be the hypotheses defined in Example 2.1 above. Then H ⊆ LM and H 0 ⊆ LM .    modeh(∗, uncle(+male, +person))  M= modeb(∗, brother (+male, −person))   modeb(1 , father (−male, +person))

First consider the theory H, which contains the single cause h, shown below. As required, every atom in h is the instance of some mode declaration in M , Moreover, because each atom corresponds to exactly one mode declaration, the input and output occurrences of each variable are uniquely determined (and, for convenience, they have been annotated below with + and − symbols). Since x and y are the only variables with input occurrences, and both of these have input occurrences in the head of h, it follows that the constraints in Definition 2.6 are trivially satisfied. + +

+ −

− +

h = uncle(x, y) ← brother(x, z), f ather(z, y) Next consider the theory H 0 , which contains the single cause h0 , shown below. Once again, every atom in h0 is the instance of some mode declaration in M , and the input and output occurrences of each variable are uniquely determined. This time, the variable z also has an input occurrence (third body atom). However, because z also has an output occurrence (first body atom) in a previous body atom, the constraints are satisfied and h0 is an instance of M . Note, however, that if the first and third body atoms of h0 were interchanged, then this would no longer be the case. + +

− +

−

+

−

+

h0 = uncle(x, y) ← f ather(z, y), f ather(w, x), f ather(w, z) At this point the reader may be wondering how the mode declaration recalls and placemarker types affect the hypothesis space. It is worth pointing out, therefore, that in practice the role of recall and type information is very clear. In particular, as shown in [15], the recall of a given mode declaration is used to limit the number of instances with the same input terms, and placemerker types ensure that every variable can be replaced by a well-typed constant. Surprisingly, however, the standard definition of mode language, recalled from [15] in Definitions 2.5 and 2.6 above, does not take this information into account. In order to provide a formal account of mode declarations suitable for proving soundness and completeness results, this deficiency clearly needs to be addressed. This will be done presently in Section 3, but first the background material is concluded, beginning with the task of ALP.

2.4 Abductive Logic Programming (ALP) Abductive Logic Programming (ALP) [9] is a branch of Artificial Intelligence (AI) [25] concerned with the construction of explanations, ∆, for sets of goals, G, with respect to a theory, T , and integrity constraints, IC. ALP explanations are very similar to ILP generalisations in the sense that, relative to the given theory, they must ‘cover’ the goals G, while being ‘consistent’ with the integrity constraints IC. However, in contrast to ILP hypotheses, which are sets of clauses, ALP hypotheses are usually restricted to ground literals of some predefined set of predicates, A, called abducible

8 A Hybrid Abductive Inductive Proof Procedure predicates. Furthermore, because of the close connection that exists between ALP and Negation As Failure (NAF) [2], the abductive notion of entailment is not classical, as is generally the case in ILP, but is typically based on some canonical model semantics such as the stable [5], well-founded [4], or perfect [20] semantics. In general, given a logic program T , a set of ground literals G, a set of formulae IC, and a set of predicates A, the task of ALP is to find a set of ground literals ∆, of predicates in A, such that G and IC are satisfied in the canonical model of T ∪ ∆. In this paper a special case of ALP will suffice, in which T is a definite theory, IC is Horn theory, and G and ∆ are sets of ground atoms. Since these restrictions guarantee that T ∪ ∆ is definite, all of the canonical model semantics coincide with the Least Herbrand Model (LHM), and the coverage and consistency conditions simply become T ∪∆ |=∗ G and T ∪∆ |=∗ IC. This ALP setting is formalised in Definition 2.8 below, which defines the notions of abductive context and abductive explanation analogously to their inductive counterparts above. Once again, it is initially assumed that the goal is not covered, but that the integrity constraints are satisfied. Definition 2.8 (Abductive Context, Abductive Explanation) An abductive context, Y , is a four-tuple hT, G, IC, Ai where T is a definite theory, G is a set of ground atoms, IC is a set of Horn clauses, and A is a set of predicates, such that T 6|=∗ G and T |=∗ IC. Now, let LA denote the set of all ground atoms with predicates in A. Then an abductive explanation of Y is a set ∆ ⊆ LA of ground atoms such that T ∪ ∆ |=∗ G and T ∪ ∆ |=∗ IC. Example 2.9 Let Y = hT, G, IC, Ai be the abductive context shown below, where B is the theory defined in Example 2.1 above. In addition to the f ather and brother relationships contained in B, the theory states that every male or female is a person, and olley, barney, cliff , pete, and sam are male. The goal states that olley is the brother of jane and sue. The first two integrity constraints state if x is known to be the father of y, then x should be male and y should be a person. The last constraint states nobody is both male and female. The predicates f ather and female are abducible.   person(x) ← male(x)        person(x) ← female(x)           male(olley)  male(barney) T = ∪B     male(cliff )         male(pete)       male(sam) G

=

IC

=

A

=

brother(olley, jane) brother(olley, sue)

   male(x) ← f ather(x, y)  person(y) ← f ather(x, y)   ← male(x), female(x) f ather, female

A Hybrid Abductive Inductive Proof Procedure 9 Then, the theory ∆ shown below is an abductive explanation of Y . The first two atoms in ∆ are needed to explain the goal G, which follows from the rule in B and the fact cliff is the father of olley. The last two atoms in ∆ are needed to satisfy integrity constraints. Since f ather(cliff, jane) and f ather(cliff, sue) are true in the LHM of T ∪ ∆, the second integrity constraint requires that person(jane) and person(sue) should also be satisfied in the LHM. As the only abducible predicates are f ather and female, this requires the atoms female(jane) and female(sue).   f ather(cliff, jane)       f ather(cliff, sue) ∆ = female(jane)       female(sue)

In many ways, the ALP task of Definition 2.8 is just a special case of the ILP task in Definition 2.2. The theory plays the role of background knowledge, the goal corresponds to the positive examples, and the abducibles comprise a primitive form of language bias. But, ALP and ILP differ markedly in their view of integrity [9]. On the one hand, ILP employs a so-called consistency view that requires the integrity constraints be satisfied in some arbitrary (i.e. any) model of B ∪ H. On the other hand, ALP adopts what is called an epistemic view, which requires that the integrity constraints be satisfied in some canonical model (i.e. LHM) of T ∪ ∆. To see the difference, observe that if a consistency view of integrity were used in Example 2.9, then the last two atoms in ∆ would be unnecessary. For, although the IC would not be satisfied in the LHM, they would be satisfied in at least one other model.

2.5 Hybrid Abductive Inductive Learning (HAIL) Hybrid Abductive Inductive Learning (HAIL) [23] is a general approach proposed recently for integrating ALP within the task of ILP. Like most ILP techniques, HAIL exploits the fact that it is usually easier to construct hypotheses incrementally as a succession of small theories each covering a few examples at a time, than to construct one large theory covering all of the examples in one go. To utilise this observation, HAIL uses a so-called a cover-set loop [13], which operates as follows. While any positive examples remain, one of them is selected as a seed example. A compressive hypothesis is then constructed that covers at least this seed example. The hypothesis is asserted into the background knowledge, any covered positive examples are removed. The cycle is repeated until all positive examples have been covered. In order to generalise each seed example, HAIL employs the inference method of Kernel Set Subsumption (KSS), introduced in [23]. The key notion, called a Kernel Set, is formalised in Definition 2.10 (note that this definition has been simplified in accordance with the ILP setting used in this paper). Informally, if B is a theory and e is a ground atom, then a Kernel Set of B and e is a Horn theory, K, whose head atoms, αi , collectively entail e with respect to B, and whose body atoms, δij , are individually entailed by B. As shown in Definition 2.11 below, any theory H that θ-subsumes K is said to be derivable by KSS from B and e. An important soundness result, which is proved in [23], shows that B ∪ H |= e for all hypotheses H derivable by KSS from B and e.

10 A Hybrid Abductive Inductive Proof Procedure Definition 2.10 (Kernel Set) Let B be a Horn theory, let e be a ground atom, and let K be a definite theory written in the form shown below. Then K is called a Kernel Set of B and e iff it holds that B ∪ {α1 , . . . , αn } |= e and B |= δij for all 1 ≤ i ≤ n and 1 ≤ j ≤ mi .   m1 1   α1 ← δ 1 , · · · · · · , δ 1   .. K= .     αn ← δn1 , · · · · · · , δnmn

Definition 2.11 (KSS) Let B be a Horn theory, let e be a ground atom, and let H be a definite theory. Then H is said to be derivable by KSS from B and e iff it is the case that H w K for some Kernel Set K of B and e. Example 2.12 Let B and H be the theories defined in Example 2.1 above. Then the hypothesis H, recalled below, is derivable by KSS from B and e, since it subsumes the Kernel Set K of the seed example e = uncle(pete, olley). uncle(x, y) ← brother(x, z), f ather(z, y) H = uncle(pete, olley) ← brother(pete, cliff ), f ather(cliff, olley) K = If, as in the example above, the Kernel Set contains a single clause, then that clause is called a Bottom Clause [15] of B and e. If the hypothesis is also restricted to a single clause, then the semantics of KSS reduces to the more established semantics of Bottom Generalisation (BG) [15, 29]. In essence, Bottom Clauses and Kernel Sets play identical roles in their corresponding semantics: they both serve to define a bounded hypothesis space with a lattice structure induced by the θ-subsumption ordering. However, because KSS is not restricted to single clauses, many potentially useful hypothesis are derivable by KSS that are simply not derivable by BG. A simple illustration of this point is now provided in Example 2.13 below, and a more complex example is presented in Section 4. Example 2.13 Let B = {p ← q(a), q(b)} and e = p. Then, the three theories {q(x)} and {q(a), q(b)} and {p} are all derivable by KSS from B and e, but only the last (and most trivial) one is derivable by BG from B and e. So far, a generic methodology for KSS has been sketched in [23], consisting of the following three phases. First, an abductive phase returns an initial hypothesis ∆ consisting of a set of ground atoms {α1 , . . . , αn }. Second, a deductive phase returns a Kernel Set K by adding a set of body atoms {δi1 , . . . , δin } to each head atom αi ∈ ∆. Third, an inductive phase returns a hypothesis H by picking one clause from the θ-subsumption lattice of each clause α1 ← δi1 , . . . , δin ∈ K. But, in order to turn the HAIL framework into a practical proof procedure, the underlying methodology must be made to exploit language bias at least to the same extent as existing practical ILP systems such as Progol. Therefore, the rest of this paper shows how language bias, in the form of mode declarations, can be incorporated semantically (Section 3) and procedurally (Section 4) into the HAIL approach.

A Hybrid Abductive Inductive Proof Procedure 11

3

Semantics of Mode-KSS

The section presents a refinement of KSS, called Mode-KSS, that is better suited to the task of inductive generalisation. For a given inductive context, the refined semantics ensures that hypotheses are consistent with examples other than the seed, and that they fall within the intended hypothesis space. However, to provide an accurate characterisation of this hypothesis space that reflects the recall and type information in the mode declarations, the notion of mode-subsumption is first introduced and used to refine the theory of mode declarations presented earlier.

3.1 Mode-Subsumption Mode declarations were formalised in Definition 2.5 and shown to provide a useful mechanism for specifying language bias. The language LM associated with a set M of mode declarations was formalised in Definition 2.6, but ignored all the recall and type information in M . The purpose of this section is to refine LM in order to reflect the way recall and types are used in practice to constrain the hypothesis space. This is achieved in two steps: first the notion of grounding characterises the well-typed ground instances of M ; then the notion of mode-subsumption lifts this concept to the general case. The result is a language LPM that refines LM with respect to a theory P , called a type theory, that is used to determine the types of the terms in the underlying language L. In practice, the type theory is just the background theory B, or, more precisely, that part of B which defines the type predicates appearing in M . To begin, the notion of a grounding is formalised in Definition 3.1 below. A brief comparison with Definition 2.6 shows groundings and instances are closely related, but with some important differences. Firstly, placemarkers are now replaced by ground terms that are ‘well-typed’ with respect to the chosen type theory P . This means a placemarker of type p may only be replaced by ground terms, t, for which the atom p(t) follows from the theory P . Secondly, while the input and output constraints are essentially the same as before, they now apply to ground terms instead of variables. Thirdly, the recall ri of a mode declaration mi is now used to limit the number of instances of mi that have exactly the same input terms. In this way, the set of P groundings GM of M with respect to P represents the set of ground instances of clauses in LM that meet the type and recall requirements contained in M . Definition 3.1 (Grounding) Let m be a mode declaration with scheme s, and let P be a theory. Then a grounding of m with respect to P is a ground atom, α, obtained from s by replacing all placemarkers of type p by terms t such that P |= p(t). Each occurrence of t that replaces a + (resp. −) placemarker is called an input (output) occurrence of t. Now, let M be a set of mode declarations. Then a grounding of M with respect to P is a definite ground clause α0 ← α1 , . . . , αn for which there is a head declaration m0 ∈ M + and body declarations m1 , . . . , mn ∈ M − such that for all 0 ≤ i ≤ n it holds: (i) αi is a grounding of mi , (ii) every term with an input occurrence in αi has an input occurrence in α0 or an output occurrence in αj for some 0 < j < i, and (iii) ri ≥ |Ai | where ri is the recall of mi and Ai is the set of atoms αj such that (a) mj = mi , and (b) αj and αi agree on the replacement of all +placemarkers. The set of all groundings P of M with respect to P is denoted GM .

12 A Hybrid Abductive Inductive Proof Procedure The type requirement is very simple. To obtain a grounding of the mode declaration modeb(1 , father (−male, +person)), it simply requires that the output placemarker be replaced by a ground term of type male, and the input placemarker be replaced by a ground term of type person. The recall requirement is also very simple. If a mode declaration m has recall r, it means at most r groundings may be obtained from m in which each +placemarker is replaced by the same term in all cases. In other words, r can be seen as bounding the cardinality of the equivalence classes of an equivalence relation defined on the groundings of m whereby two groundings are equivalent whenever they agree on the replacement of all +placemarkers. Thus, in any given clause the mode declaration modeb(1 , father (−male, +person)) may be used to infer only one father atom for each individual appearing in the second argument. The notion of mode-subsumption is introduced in Definition 3.2, and used to define the non-ground instances of M with respect to P . Intuitively, mode-subsumption is just like θ-subsumption in that it allows atoms to be dropped from a clause, and terms to be replaced by variables. However mode-subsumption acts in strict accordance with M and P . Informally, a clause C mode-subsumes a grounding D, written C