Evolutionary Concept Learning in First Order Logic - CiteSeerX

21 downloads 964 Views 316KB Size Report
proaches to Inductive Logic Programming (ILP). Af- ter a short ... cept Learning,First Order Logic,Inductive Logic Pro- gramming. ... ISSN 0921-7126, IOS Press.
1

Evolutionary Concept Learning in First Order Logic An Overview

Federico Divina Computational Linguistics and AI Section Tilburg University, Tilburg, The Netherlands e-mail: [email protected] This paper presents an overview of evolutionary approaches to Inductive Logic Programming (ILP). After a short description of the two popular ILP systems FOIL and Progol, we focus on methods based on evolutionary algorithms (EAs). Six systems are described and compared by means of the following aspects: search strategy, representation, hypothesis evaluation, search operators and biases adopted for limiting the hypothesis space. We discuss possible advantages and drawbacks related to the specific features of the systems along these aspects. Issues concerning the relative performance and efficiency of the systems are addressed. Keywords: Evolutionary Computation,Inductive Concept Learning,First Order Logic,Inductive Logic Programming.

1. Introduction The ability of learning from examples is a typical characteristic of all natural systems. Machine Learning was conceived also with the aim of developing algorithms capable of learning from examples. In particular in Inductive Concept Learning (ICL) a finite set of positive and negative examples, called training set, is used to induce a concept description. The process can be seen as a search in a space of candidate hypotheses [34] expressed in a given representation language. Starting from an initial hypothesis, generalization and specialization operators may be applied to direct the search towards good hypotheses that cover many positive examples and few negative ones (a hypothesis covers an example if the hypothesis is true on that example). AI Communications ISSN 0921-7126, IOS Press. All rights reserved

The aim of this paper is to provide an overview of recent inductive learning methods based on evolutionary algorithms which use a fragment of First Order Logic (FOL) as the hypothesis language. FOL provides a formal framework for describing and reasoning about objects, their parts, and relations among the objects and/or the parts. This area is known as Inductive Logic Programming (ILP) [40,43,16]. ILP constitutes a central topic in Machine Learning, with relevant applications to problems in complex domains, like natural language and computational biology [38], where problems can not be represented reasonably by a set of attributes. The approach used in the majority of first–order based learning systems is to use specific search strategies, like the general-to-specific (hill climbing) search [44] and the inverse resolution mechanism [39]. However, the greedy selection strategies adopted for reducing the computational effort render these techniques often incapable of escaping from local optima. Recently various systems based on Evolutionary Algorithms (EAs) for ILP have been shown to be effective alternatives to standard ILP methods. This approach is motivated by two major characteristics of EAs: their good exploration power, that gives them the possibility of escaping from local optima, and their ability to cope well when there is interaction among arguments and when arguments are of different type. Another appealing feature of EAs is represented by their intrinsic parallelism. Moreover, EAs provide a learning method motivated by an analogy to biological evolution, which is known to be a successful, robust method for adaptation within biological systems. In this paper we first describe two popular ILP systems, FOIL and Progol, and then focus on six recent EA based ILP systems. Even if the aim of this paper is to provide an overview over evolu-

2

Federico Divina / Evolutionary Concept Learning in First Order Logic

tionary approaches to ILP, it is nevertheless interesting to briefly describe the two non–evolutionary systems. We have chosen to present FOIL because it represents probably the most popular system for ILP, and Progol because of its effectiveness in solving a number of ILP problems and also because EA variants of it have been proposed. The following features of the EA based systems are considered: search strategy, representation, hypothesis evaluation, search operators and search biases. We discuss possible advantages and drawbacks of the systems related to these specific features. Moreover, issues concerning the relative performance and efficiency of the systems are addressed. The order in which the systems are presented follows the complexity of the encoding they adopt. First REGAL, DOGMA and G-NET are introduced, which adopt a standard bit string encoding, next SIA01 and ECL which adopt a higher level representation, and finally GLPS where each individual encodes a whole logic program. The paper is organized as follows. The basic principles of evolutionary computation (EC) and of FOL are given in section 2. In section 3 ICL and ILP are introduced. Section 4 describes FOIL and Progol, while in section 5 two evolutionary variants of Progol, and six EA based systems are presented. In section 6 we compare the six evolutionary systems with respect to the above mentioned features. Finally in section 7 the most promising aspects of the systems are highlighted, which could be used as a possible guide in the design of a new EA based ILP system, and in section 8 some conclusions are given.

2. Basic Notions of EC and FOL In this section the basic notions of EC and FOL needed in this paper are given. For a detailed introduction to EC, the reader can refer to, e.g., [17,5,57], while for FOL the reader can refer to, e.g., [50,6]. The following notation is used throughout the paper: – E denotes a set of examples, E + and E − denote the positive and negative example sets and e is a single example; – B indicates the background knowledge;

– C denotes a clause, H a hypothesis; – l stands for a literal, P for a predicate symbol and X, Y, Z, . . . for variables; – px and nx denote the positive and negative examples covered by x respectively, where x is either a clause or a hypothesis. 2.1. Evolutionary Computation EC is a population–based stochastic iterative optimization technique based on the Darwinian concepts of evolution. Inspired by these principles, like survival of the fittest and selective pressure, EC tackles difficult problems by evolving approximate solutions of an optimization problem inside a computer. An algorithm based on EC is called an evolutionary algorithm. Given an optimization problem, all EAs typically start from a set, called population, of random (candidate) solutions. These solutions are evolved by the repeated selection and variations of more fit solutions, following the principle of the survival of the fittest. We refer to the elements of the population as individuals or as chromosomes, which encode candidate solutions. Solutions can be encoded in many different ways. A typical example is represented by binary string encoding, where each bit of the string has a particular meaning. In general, with the term phenotype we refer to an object forming a possible solution within the original context, while its encoding is called genotype. To each genotype must correspond at most one phenotype, so that the chosen encoding can be inverted, i.e., genotypes can be decoded. If the genotype is equal (or very similar) to the phenotype, the encoding can be referred as a high level encoding. Individuals are typically selected according to the quality of the solution they represent. To measure the quality of a solution, a fitness function is assigned to each individual of the population. Hence, the better the fitness of an individual, the more possibilities the individual has of being selected for reproduction and the more parts of its genetic material will be passed on to the next generations. The selected individuals reproduce by means of crossover and mutation. In simple terms crossover swaps some genetic material between two or more individuals, while mutation changes a small part of the genetic material of an individual to a new ran-

Federico Divina / Evolutionary Concept Learning in First Order Logic

dom value. From the reproduction phase, new offspring are generated. Offspring compete with the old individuals for a place in the next generation. In this way EAs can efficiently explore the space of the possible solutions of an optimization problem. This space is called search space, and it contains all the possible solutions that can be encoded. In ICL and ILP the search space is often referred as hypothesis space, since it consists of all possible hypotheses that can be considered. EAs have been shown to be efficient in searching in huge spaces [23] (e.g., [54,15,4,21,20]). The stochastic operators used allow EAs to search for possible solution in an efficient way. For these reasons, EAs represent a valid alternative to greedy heuristic. An important aspect that has to be addressed in EAs is the maintenance of diversity in the population. Maintaining diversity in the population allows to have individuals spread across the hypothesis space, so that all the areas of the hypothesis space can be searched, and there are no overcrowded regions. This can be seen as having different species occupying different niches in the search space, in the same way as in a successful natural system different species can survive in different niches of the environment. Moreover, if diversity is maintained computational resources are exploited more effectively by avoiding useless replications and redundancies. Different methods for achieving species and niches formations, as well as for maintaining diversity in the population, have been proposed. Among these crowding [9] and sharing function [24] are two popular methods. 2.2. First Oder Logic The basic components of FOL are called terms. Terms can be constant, variable or functions. A constant denotes a particular object in some domain, e.g., “4” is a constant denoting the number four in the domain of natural numbers. A variable is a name that can denote any object in a domain, and a function symbol denotes a function taking n arguments from a domain and returning one object of the domain. In addition to terms we have predicate symbols. A predicate symbol stands for the name of a relationship between objects. A predicate symbol applied to a set of terms is called literal. For instance,

3

f ar(X, Y, 2) is a literal, where f ar is a predicate symbol, X, Y are variables and 2 is a constant. Literals can be positives or negatives. For example f ar(X, Y, 2) is a positive literal, and it is true if f ar(X, Y, 2) is true. ¬f ar(X, Y, 3) is a negative literal, which is true if f ar(X, Y, 3) is false. We refer to positive literals also as atoms. We can now define Horn clauses: a Horn clause is a clause of the form A ← L1 , . . . , Ln , where A is an atom and L1 , . . . , Ln are literals. A is also called the head of the clause, while we can refer to L1 , . . . , Ln as the body of the clause. The above clause can be read as if (L1 , . . . , Ln ) is true then A is true, or more formally ∀Xi (L1 , . . . , Ln ) → A. A clause containing no variable is called a ground clause, and if a clause consists of only the head it is called fact. In ILP, a clause C is said to cover an example e if the theory formed by C and a given background knowledge B logically entails e. For example, given E + = {grandparent(tom,bill), grandparent(eve,cliff )}; E − = {grandparent(tom,cliff ), grandparent(eve,tom)}; B = { parent(tom,jack), parent(jack,bill), parent(eve,peter),parent(peter,cliff )}; then the clause C = grandparent(X,Y) ← parent(X,Z), parent(Z,Y) covers the positive examples and none of the negative ones. The theory formed by C and B is a logic program, i.e., a logic program is a set of Horn clauses. In order to verify if a clause covers an example, variables contained in the clause need to be bound to the constants belonging to the example. To this aim, substitutions are used. A substitution θ = {X1 /t1 , . . . , Xn /tn } is a finite mapping from variables to terms that assign to each variables Xi a term ti , ti 6= Xi , 1 ≤ i ≤ n. Applying a substitution θ to a term t is the results of the simultaneous replacement of each occurrence of a variable in t appearing also in θ with the correspondent term.

3. Inductive Concept Learning The objective of inductive learning is to find a hypothesis that explains the classifications of the examples, given their descriptions. More formally: Given E + , E − and B, the aim of inductive learning is to find a hypothesis H such that H covers every e in E + and H does not cover any e in

4

Federico Divina / Evolutionary Concept Learning in First Order Logic

E F F E C T I V E N E S S

3.1

Search Strategy

3.2

Representation

3.3

Evaluation

3.4

Operators

3.5

Biases

Sampling Mechanisms -

System for ICL

 Parallelization

E F F I C I E N C Y

Fig. 1. Some features of a system for ICL. Next to each effectiveness box the section where the relative feature is first described is reported.

E−. In the example given in the previous section, C = grandparent(X, Y ) ← parent(X, Z), parent (Z, Y ). is an example of such a hypothesis, because, given the sets of examples E + and E − and background knowledge B, C covers all the positive examples and none of the negative ones. Figure 1 illustrates important features for effectiveness (left box) and efficiency (right box) of a system for ICL. This division is not always so well defined. For instance parallelization may in some cases also influence the effectiveness of the system. We explain briefly the features in the effectiveness box. Sampling and parallelization will be briefly addressed in the description of the REGAL and ECL systems. 3.1. Search Strategies The search space can be structured with a general-to-specific ordering of hypotheses. We say that a hypothesis H1 is more general that a hypothesis H2 , and H2 is more specific than H1 , if all the examples covered by H2 are also covered by H1 . With this ordering of hypotheses, the search space can be seen as a lattice with the general to specific ordering. Many systems for ICL exploit this ordering of hypotheses in the operators they use for moving in the search space and for deciding the direction in which the search is performed. Some systems start the search from a specific hypothesis, which is then generalized during the learning process. This

approach is called bottom-up. Alternatively a topdown approach can be used. In this case the learning process starts with a general hypothesis which is specialized to fit the training examples. Inside these two approaches a search strategy is used. Sequential covering and hill climbing (for an explanation see, e.g., [48]) are two popular search strategies. Systems adopting sequential covering iteratively learn a set of rules that represent the target concept. To do this they first learn a rule. The learned rule is added to the emerging target concept. All the positive examples covered by the learned rule are removed from the training examples set. Another rule is learned using the current set of training examples, and added to the emerging target concept. The process is iterated until all positive examples are covered. Hill climbing algorithms refine an initial hypothesis. Several variations of the current hypothesis are built. Among these the best one is chosen, according to some criterion. The process is iterated until a sufficiently good hypothesis is found. The process is called hill climbing because it proceeds with optimization steps toward a local best hypothesis. Hill climbing can be used inside sequential covering for learning rules. 3.2. Representation Language and Encoding When we want to solve a problem with a computer, the first thing that should be done is to translate the problem into computational terms. In our case this means to choose a representation language and an encoding.

Federico Divina / Evolutionary Concept Learning in First Order Logic

The choice of a representation language may vary from a fragment of propositional calculus to second order logic. While the former has a low expressive power, the latter is rather complex, and for this reason is seldom used. FOL is used in various successful systems for ICL. Usually Horn clauses are employed. Every system using FOL usually adopts some restrictions. For example function symbols may not be allowed to appear as arguments of a literal, only variables appearing in the head of the clause may be allowed to appear in the literals of the body, etc. Another limitation could be the exclusion of the recursion. A clause is recursive if it is defined in terms of itself. For instance, ancestor(X, Y ) ←− parent(X, Z), ancestor(Z, Y ) is a recursive clause. These restrictions are adopted in order to reduce the size of the search space, and can be seen as language biases, see section 3.5. The systems described in this paper learn a set of rules. Once a representation language has been chosen, rules need to be encoded in some way, in order to be processed. Examples of encodings are binary encoding and tree encoding. In section 5 we will see some examples of how rules can be encoded. The encoding can represent either a single rule or a set of rules. Notice that sometimes the term representation is used instead of the term encoding. 3.3. Evaluating a Hypothesis In simple terms what characterizes a hypothesis as good is how well it performs on the training examples and a prediction of how well its behavior will be on unseen examples. For instance, a hypothesis covering several positive examples and no negative examples could be considered as a good hypothesis. A scoring function (or fitness function in EAs) is used to measure the goodness of a hypothesis. Several properties can be used for defining a scoring function, like: completeness, consistency and simplicity. A hypothesis H is complete iff H covers all the positive examples. H is consistent iff H does not cover any negative examples. Completeness and consistency are two properties that almost all inductive learning systems incorporate in the scoring function they adopt.

5

Simplicity is a concept often used following the Occam’s razor principle [8], which is stated as to prefer the simplest hypothesis that fits the data. One explanation for this is that there are fewer short hypotheses than long ones, and so it is less likely that one will find a short hypothesis that coincidentally fits the data. There are many ways for defining simplicity, e.g.: – Short rules [8]. Prefer shorter rules over longer rules. The length of a rule depends on the representation used, and so the same rule could be considered short by a learner and long by another. – MDL [47]. This is a more general concept, since it uses a notion of length that does not depend on the particular representation used. According to the Minimal Description Length (MDL) principle the best model for describing some data, is the one that minimizes the sum of the length of the model and the length of the data given to the model. Here by length we mean the number of bits needed for encoding a model or the data. – Information gain [45]. Information gain is a measure of how a change in a hypothesis affects its classification of the examples. This principle when incorporated in the search strategy of a method like in decision trees, may bias the search toward shorter rules, as it aims to minimize the number of tests needed for the classification of a new object. However it is mostly up to the search strategy adopted to prefer short rules. 3.4. Operators Operators are used for moving in the search space. These operators vary from system to system, depending on the approach used, the problem to solve, the ideas of the authors and so on. An operator basically receives a hypothesis, changes it in some ways and returns the changed hypothesis. Systems not relying on evolutionary techniques, e.g., Progol, employ so called refinement operators. An example of a refinement operator is inverse resolution. We will describe here the inverse resolution in the propositional form, while for details about the inverse resolution in FOL the reader can refer to [36]. What is done in this method is simply inverting the resolution rule. Given rules C1 and

6

Federico Divina / Evolutionary Concept Learning in First Order Logic

C2 , the resolution operator constructs a clause C which is derived from C1 and C2 . For example, if C1 is going out ∨ staying home and C2 is ¬staying home∨ study then C will be going out ∨ study. The inverse resolution operator then produces C2 starting from C1 and C. The inverse resolution operator is not deterministic. This means that in general there are multiple choices for C2 . A way for limiting the number of choices is to restrict the representation language to Horn clauses and to use inverse entailment. The idea behind inverse entailment is to change the entailment constraint B ∧ H |= e into the equivalent form B ∧ ¬e |= ¬H. The previous constraint says that from the background knowledge and the negation of the classification of an example, the negation of a hypothesis explaining the example can be derived. Thus, from the modified constraint one can use a process similar to deduction to derive a hypothesis H. Systems based on EAs employ crossover and mutation operators, which are explained in section 2.1. 3.5. Biasing the Hypothesis Space If we want to ensure that a solution is found, it is obvious that the unknown target concept must be contained in the portion of the hypothesis space that is searched. Using a hypothesis space capable of representing every learnable concept could seem the solution, but this would lead to a very large search space. To illustrate this, consider a learner that uses examples described by a set of attributes. In general, in this setting, an unbiased hypothesis space contains 2|E| possible concepts, where | E | is the cardinality of the example set. For instance, if a set of attributes can describe 90 different examples of the concept to be learned, then there are 290 distinct target concepts that a learner might be called upon to learn. This is a huge space to search, and for this reason some biases have to be used in order to limit the search to a portion of the hypothesis space [41]. To limit the size of the search space two main kind of biases are used [18]: – Search bias. This kind of bias imposes a direct limitation on the search performed by the learner, limiting the hypothesis space by means of some bound;

– Language bias. This kind of bias imposes a limitation in what kind of hypotheses can be represented by the algorithm. The hypothesis space is limited to the possible set of representable hypotheses.

4. Two Popular ILP Systems In this section we describe two of the most popular ILP algorithms. The first one is FOIL [44]. FOIL has proved to solve a wide variety of problems, and for this reason its results are often taken as a reference measure for other systems. The second system described in this section is Progol [36,37], which employs inverse resolution for solving its learning task. 4.1. FOIL FOIL searches the hypothesis space using a topdown search approach and adopts an AQ-like sequential covering algorithm [32]. AQ uses a sequential covering algorithm, to build its concept description. It starts from an empty set of rules. A positive example e is selected, and a generalto-specific search is conducted in order to find a rule covering e (and possibly more positive examples) and no negative examples. Among the constructed rules the “best” one is selected and added to the emerging set of rules forming the concept description. All the positive examples covered by the found rule are removed and the process is repeated until all positive examples are covered. The “best” rule is usually some compromise between the desire to cover as many positive examples as possible and the desire to have as compact and readable a representation as possible. In the same way FOIL first induces a consistent clause and stores it. All the positive examples covered by the learned clause are removed from the training set, and the process is repeated until all positive examples are covered. When a clause needs to be induced, the system employs a hill climbing strategy. It starts with the most general clause, consisting of a clause with an empty body and head equal to the target predicated. All the arguments of the head are distinct variables. In this way this initial clause classifies all examples as positive. The clause is then specialized by adding literals to its body. Several literals are considered

Federico Divina / Evolutionary Concept Learning in First Order Logic Algorithm(F OIL) 1 Initialize the clause 2 while the clause covers negative examples 3 do Find a “good” literal to be added to 4 the clause body; 5 Remove all examples covered by the clause; 6 Add the clause to the emerging concept definition; 7 If there are any uncovered positive examples 8 then go to 1;

Fig. 2. The scheme of the algorithm adopted by FOIL.

for this purpose. The literal yielding the best improvement is added to the body. If the clause is not consistent then another literal is added. In figure 2 a scheme of the algorithm adopted by FOIL is presented. In lines 2 and 3 the hill climbing phase is performed. The representation language of FOIL is Datalog, a restricted form of FOL, that omits disjunctive descriptions, and function symbols. Negated literals are allowed, where the negation is interpreted in a limited way (negation by failure). The scoring function used by FOIL to estimate the utility of adding a new literal is based on the number of positive and negative examples covered before and after adding the new literal. More precisely, let C be the clause to which a new literal l has to be added and C 0 the clause created by adding l to C. The information gain function used is then the following: Inf o gain =

C0 t(log p 0p+n C C0



C log pCp+n ) C

where t is the number of positive examples covered by C that are still covered after adding l to C. The add operator considers literals of the following form: – P (X1 , X2 , . . . , Xk ) and ¬P (X1 , X2 , . . . , Xk ), where Xi s are existing variables of the clause or new variables; – Xi = Xj or Xi 6= Xj , for variables of the clause; – Xi = c and Xi 6= c, where Xi is a variable of the clause and c is an appropriate constant; – Xi ≤ Xj , Xi > Xj , Xi ≤ v and Xi > v, where Xi and Xj are variables of the clause that can assume numeric values and v is a threshold chosen by FOIL.

7

Algorithm(P rogol) 1 If E = ∅ return B; 2 Let e be the first example in E; 3 Construct a most specific clause ⊥ for e 4 using inverse entailment; 5 Construct a good clause C from ⊥; 6 Add C to B; 7 Remove from E all the examples that are now covered; 8 Go to 1;

Fig. 3. Covering algorithm adopted by Progol. The emerging hypotheses are added to the background knowledge and the algorithm is repeated until all the positive examples are covered.

There is a constraint on literals that can be introduced in a clause: at least one variable appearing in the literal to be added must be already present in the clause. Another restriction adopted by FOIL, is motivated by the Occam’s razor principle, which states that when a clause becomes longer (according to some metric) than the total number of the positive examples that the clause explains, that clause is not considered as a potential part of the hypothesis any more. There is also another bias on the hypothesis space, and it is the upper bound represented by the most general clause initially generated. In fact all the clauses that are generated are more specific than the initial one. 4.2. Progol Progol uses inverse entailment to generate just a single most specific clause (usually called “bottom clause” and denoted as ⊥) that, together with the background knowledge, entails the observed data. ⊥ can then be used to bound a top-down search through the hypothesis space with the constraint that the only clauses considered are those more general than the initial bound. Progol uses a sequential covering algorithm to carry out its learning task illustrated in figure 3. For each positive example e that is not yet covered, it first searches for ⊥, which covers e (line 3). For doing this it applies the inverse entailment i times, where i is a parameter specified by the user. In line 4 a A∗ strategy is adopted for finding a good clause starting from the most general clause. Progol uses θ-subsumption for ordering the hypothesis space. A clause C1 θ−subsumes a clause C2 iff there exists a substitution θ such that C1 θ ⊆ C2 , where C1 , C2 are described by the set of lit-

8

Federico Divina / Evolutionary Concept Learning in First Order Logic

erals in their disjunctive form (C1 is more general than C2 , written also C1  C2 ). The refinement operator maintains the relationship 2  C ⊥ for every considered clause C. In the previous relationship 2 is the empty clause. Thus the search is limited to the bounded sub-lattice 2  C ⊥. Since C ⊥, there exists a substitution θ such that Cθ ⊆⊥. So for each l in C, there exists a literal l0 in ⊥ such that lθ = l0 . The refinement operator has to keep track of θ and a list of those literals l0 in ⊥ that have a corresponding literal l in C. Any clause C that subsumes ⊥ corresponds to a subset of literals in ⊥ with substitutions applied. The scoring function used to measure the goodness of a candidate clause C is: f (C) = pC − (nC + lghC + hC ) where lghC is the length of C minus 1 and hC is the expected number of further atoms that have to be added in order to complete the clause. hC is calculated by inspecting the output variables in the clause and determining whether they have been defined. The output variables are given by a user supplied model. A first bias on the hypothesis space is represented by the upper bound 2 and by the lower bound ⊥. A second constraint is the use of the head and body mode declarations together with other settings to build the most specific clause. With a mode declaration the user specifies, for each atom used, the modality in which an argument can be used. This model is also used for computing the value of hC in the scoring function. So for example, it can be specified that a particular argument is an input variable, or an output variable, or a particular constant. Progol imposes a restriction upon the placement of input variables. Every input variable in any atom has to be either an input variable in the head of the clause or an output variable in some atom that appeared before in the clause. This imposes a quasi-order on the body atoms and ensures that the clause is logically consistent in its use of input and output variables.

5. The Evolutionary Approach EAs have proved to be successful in solving comparatively hard optimization problems, as well as problems like ICL. EAs have an intrinsic parallelism and can therefore exploit parallel machines

much more easily than classical search algorithms. Furthermore EAs have the capability of escaping from local optima, while greedy algorithms may not show this ability. Finally EAs tend to cope better than greedy rule induction algorithms when there is interaction among arguments [18]. Depending on the representation used, two major approaches are used: the Pittsburgh and the Michigan approach, so called because they were first introduced by research groups at the Pittsburgh’s and Michigan’s university, respectively. In the former case each individual encodes a whole solution, while in the latter case an individual encodes a part of the solution. Both approaches present advantages and drawbacks. The Pittsburgh approach allows an easier control of the genetic search, but introduces a large redundancy that can lead to hard to manage populations and to individuals of enormous size. The Michigan approach, on the other hand, allows for cooperation and competition between different individuals, hence reduces redundancy, but requires sophisticated strategies, like co-evolution, for coping with the presence in the population of super individuals. 5.1. Two Evolutionary Variants of Progol In [52] a GA is used inside Progol for exploring the bounded hypothesis space in search of a good clause (step 5 of figure 3). A slightly modified version of the Simple Genetic Algorithm [23] is used for this purpose. The GA used adopts a Michigan approach. Each clause is represented by a binary string. Generalization and specialization crossover and a standard mutation are used as genetic operators. The GA evolves a population of clauses which all subsume the most specific clause computed by Progol with the application of the inverse entailment. Another GA based system using Progol is EVIL 1 [46]. This algorithm adopts a Pittsburgh approach. Every individual thus represents a set of rules (logic program) encoded as a tree structure. Each node of the tree represents a single clause. In this way it is possible to represent a whole logic program inside a single individual, by means of a tree. A tree representation allows also to easily define crossover and mutation operator that can act on a logic program.

Federico Divina / Evolutionary Concept Learning in First Order Logic

At each generation individuals induce new rules and add them to the logic program they have already induced. Progol is used for inducing rules. The reproduction phase uses a crossover operator. This operator acts on trees, randomly exchanging subtrees between the two parents. 5.2. REGAL REGAL (RElational Genetic Algorithm Learner) [19,42] exploits the explicit parallelism of GAs. In fact it consists of a network of genetic nodes fully interconnected and exchanging individuals at each generation. Each genetic node performs a GA on an assigned set of training examples. A supervisor node is used in order to coordinate these subpopulations. The system adopts a Michigan approach, each individual encodes a partial solution, i.e., a clause. The representation language used by REGAL is an intermediate between V L2 and V L21 [33,32], in which terms can be variables or disjunction of constants, and negation occurs in a restricted form. An atomic formula of arity n has the form P (X1 , . . . , Xn , K), where X1 , . . . , Xn are variables and K is a disjunction of constant terms, denoted by [v1 , . . . , vm ], or the negation of such a disjunction. For example, these are well formed formulas: shape(X, [square, triangle]), f ar(X, Y, [1, 2, 3]), color(X, ¬[red, blue]). The first formula states that the shape of X is either a square or a triangle, and corresponds to the two literals shape(X, square) and shape(X, triangle). Before introducing how individuals are actually encoded by REGAL, we first have to introduce the concept of language template used by REGAL. Informally, the template is a formula Λ belonging to the language, such that every admissible conjunctive concept description can be obtained from Λ by deleting some constants from the internal disjunctions occurring in it. The predicates in the template can be divided in predicates in completed form and those not in completed form. A predicate is in completed form if the set [v1 , . . . , vm ], which constitutes its internal disjunction, is such that the predicate can be satisfied for any binding of the variables X1 , . . . , Xn . Thus, in other words, a predicate containing a disjunctive term in completed form is true on every instance in the learning set.

9

For instance,in figure 4 color(X, [red, blue, ∗] is in completed form, while weight(X, [3, 4, 5]) is not in completed form. The symbol * means “everything which does not appear in the internal disjunction”. The predicate color(X, [red, blue, ∗] is in completed form because [red, blue, ∗] is the set of all possible colors. Thus a predicate is in completed form if its internal disjunction list all the constants that belongs to the domain. A language template Λ must contain at least one predicate in completed form. Indeed, given a language template, the search space explored by REGAL is restricted to the set H(Λ) of formulas that can be obtained by deleting some constants from the completed terms occurring in Λ. This is because predicates not in completed form have the role of constraints and must be satisfied by the specific binding chosen for the variables in Λ, while predicates in completed form are used to define the search space. Deleting a constant from a completed or incompleted term makes the term more specific. Since the search space is limited to H(Λ), only predicates in completed form need to be processed, and so encoded. REGAL uses bit strings for this purpose, where a string is divided into substrings. Each substring corresponds to a literal, in the same order as they appear in the language template. Each bit corresponds to a term. If the bit corresponding to a given term vi in a predicate P is set to 1, then it means that vi belongs to the current internal disjunction, whereas, if it is set to 0 it does not belong to the internal disjunction. An example of a language template and of the representation of formulas is given in figure 4. In the figure, ϕ1 correspond to the rule weight(X, [3, 4, 5] ∧ color(X, [red])) ∧ shape(X, ¬[square, circle]) ∧ f ar(X, Y, [1, 2]). Notice that the predicate weight is not encoded since it is not in completed form . The first substring of ϕ1 corresponds to the predicate color, and it means that only the constant red belongs to the internal disjunction of this predicate, i.e., red is the only constant that can appear as argument. The second substring correspond to the predicate shape, and it means that triangle and ∗ belongs to the internal disjunction, which corresponds to ¬[square, circle]. The language template used by REGAL is a propositionalisation method, i.e., a method for reformulating a FOL learning problem into an attribute-value problem. It is strongly related to

10

Federico Divina / Evolutionary Concept Learning in First Order Logic

Λ = weight(X, [3, 4, 5]) ∧ color(X, [red, blue, ∗]) ∧ shape(X, [square, triangle, circle, ∗]) ∧f ar(X, Y, [1, 2, 3, 4, 5, ∗]) Λs = color(X, [red, blue, ∗]) ∧ shape(X, [square, triangle, circle, ∗]) ∧ f ar(X, Y, [1, 2, 3, 4, 5, ∗]) ϕ1 = weight(X, [3, 4, 5]) ∧ color(X, [red]) ∧ shape(X, ¬[square, circle]) ∧ f ar(X, Y, [1, 2]) ϕ2 = weight(X, [3, 4, 5]) ∧ color(X, [blue]) ∧ shape(X, [square]) ∧ f ar(X, Y, [2]) s(Λs ) → 1

1

1

1

1

1

1

1

1

1

1

1

1

ϕ1 → 1

0

0

0

1

0

1

1

1

0

0

0

0

ϕ2 → 0

1

0

1

0

0

0

0

1

0

0

0

0

Fig. 4. In the figure Λs is the subset of Λ consisting of the predicates in completed form. The bit strings are divided in substrings each of them corresponding to a predicate in completed form, appearing in the same order than in Λ. weight is not encoded because it is not in completed form. So the first substring of ϕ1 correspond to the predicate color, the second substring to shape and the third substring to f ar.

some of the propositionalisation methods proposed in [1]. We refer the reader to [29] for a comparison of various methods for propositionalisation. When the system evaluates a formula on an example, each variable in the formula has to be bound to some object in the description of the example. Then the predicates occurring in the formula are evaluated on the basis of the attributes of the object bound to their variables. A formula is said to be true on an example iff there exists at least one choice such that all the predicates occurring in the formula are true. The user has to specify how to evaluate the semantics of the predicates before starting to run REGAL on a specific application. The fitness of an individual ϕ is given by the function f (ϕ) = f (z, nϕ ) = (1 + Az)e−nϕ , where z is a measure of the simplicity 1 of the formula, nϕ is the number of negative examples covered by the formula and A is a user tunable parameter with default value of 0.1. Individuals are selected for reproduction by means of the Universal Suffrage (US ) selection mechanism. This selection mechanism works as follow: 1. the operator randomly selects n positive examples ei , 1 ≤ i ≤ n; 1 Namely z is the number of 1s in the string divided by the length of the string.

2. for each ei an individual is selected. For all ei a roulette wheel tournament is performed among individuals covering ei . The winners of each tournament are selected for reproduction. The dimension of the sector associated to each individual is proportional to its fitness. If an example ei is not covered then a new individual covering ei is created using a seed operator. REGAL adopts four crossover operators: the classical two-point and uniform crossovers, and the generalizing and specializing crossovers. The generalizing crossover works by OR-ing some selected substrings of the parents, while the specializing crossover works by AND-ing. The probability of applying the first two classical crossovers is higher when the two selected individuals have a low fitness. Conversely, the higher the fitness the more likely is to apply the other two crossovers. This choice is justified by the observation that twopoints and uniform crossovers have an high exploration power, while the generalizing and specializing crossover can be used for refining individuals that are already good. The mutation operator is a classical bit mutation operator, and can affect all the bits of the string. A first bias for limiting the hypothesis space is represented by the language template. The set of examples that is assigned to a particular node represents another bias. A node will develop individuals that belong to the species determined by the examples assigned to the node.

Federico Divina / Evolutionary Concept Learning in First Order Logic Nodal Genetic Algorithm( Node ν ) 1 Initialize the population P opν and evaluate it; 2 while not solve 3 do receive µ· | P opν | individuals from the network 4 and store them in P opnet ; 5 Select Bν from P opν ∪ P opnet with the US; 6 Recombine Bν using crossover and mutation; 7 Update P opν and P opnet with the 8 new individuals in Bν ; 9 Send P opnet on the network; 10 Send the status to the supervisor; 11 Check for messages from the supervisor;

Fig. 5. Genetic algorithm used by a node ν in the distributed version of REGAL. The algorithm is repeated until the node receives a solve signal from the supervisor. In the algorithm µ is a migration parameter.

In figure 5 a scheme of the genetic algorithm for a node ν is presented. In line 3 the node receives a number of individuals from the network, these individuals will be used for avoiding the lethal mating problem (lethal matings are matings bound to produce bad offspring [55]). The execution will end when the node receives a solve signal from the supervisor. During the learning process the supervisor periodically receives and stores the best solution found by each genetic node. From these rules a solution is extracted. For this purpose first the + set EH is constructed, as the union of all positive examples covered by the received clauses. The clauses are sorted in decreasing order according to π(C) = pC · f (C), where pC is the number of positive examples covered by C. The first n best + clauses able to cover EH represent the solution. 5.3. G-NET G-NET (Genetic Network) [2] represents a descendant of REGAL. As its predecessor, G-NET is a distributed system, with a collection of genetic nodes and a supervisor. However, G-NET differs from REGAL for many aspects. First, G-NET adopts a co-evolution strategy by means of two algorithms. The first algorithm computes a global concept description out of the best hypotheses emerged in various genetic nodes. The second algorithm computes the assignment of the positive concept instances to the genetic nodes. The strategy consists of addressing the search on the concept instances that are covered by poor hypotheses, without omitting to continue the refinement of the other hypotheses.

11

G-NET is based on the same theory of species and niches formation and on the same representation language adopted by REGAL. The fitness function used by G-NET is different from the one employed in REGAL. In fact G-NET uses two functions. The first one is used at a global level, while the second function is used at local level, so for evaluating a clause in a genetic node. A global hypothesis H is evaluated in the following way: fG (H) = M DLM AX − M DL(¬pH + nH )− −M DL(H) where M DLM AX is the MDL of the whole learning set and ¬pH is the number of positive examples not covered by H. The formula used for evaluating an individual ϕ at the local level is the following: fL (ϕ) = M DLM AX − M DL(ϕ) − M DL(¬pϕ )+ +(fG (H 0 ) − fG (H)) In the above formula H is the current global hypothesis and H 0 is the hypothesis obtained by adding ϕ to H. Another difference between REGAL and GNET lies in the selection operator. In fact, G-NET does not use the universal suffrage operator, but individuals are selected with a fitness proportional selection. G-NET adopts three kinds of mutation operators. One of the mutation operators is used in order to generalize an individual, another one is used for the specialization and a third mutation operator is used for creating new clauses, so it can be also seen as a seeding operator. The crossover is a combination of the two-point crossover with a variant of the uniform crossover, modified in order to perform either generalization or specialization of individuals. Both crossover and mutation operators enforce diversity, so that it is assured that in a genetic node there are no equal clauses. 5.4. DOGMA DOGMA (Domain Oriented Genetic MAchine) [27,26] employs two distinct levels. On the lower level the Michigan approach is adopted, while on a higher level the Pittsburgh approach is used. On the lowest level the system uses fixed length chromosomes, which are manipulated by crossover and mutation operators. On the higher level

12

Federico Divina / Evolutionary Concept Learning in First Order Logic

chromosomes are combined into genetic families, through some special operators that can merge and break families. The representation language and encoding adopted by DOGMA are equal to those used by REGAL. The fitness function combines two different functions. One is based on the MDL principle, and the other is based on the information gain measure. The total description length of a hypothesis H consists of the hypothesis cost, i.e., the length of the encoding of H, and the exception cost, which is the encoding of the data that is erroneously classified by H. The unit of length used is a binary digit. To turn the MDL principle (see section 3.3) into a fitness function, the MDL of the current hypothesis H is compared against the total exception length with the following function: fM DL (H, E) = 1 −

M DL(H,E) W∅ ×M DL(H∅ ,E)

In the above formula, M DL(H∅ , E) stands for the total exception length, i.e., the description length of an empty hypothesis that covers no examples. W∅ is a weight factor that is used to guarantee that even fairly bad hypotheses have a positive fitness. This function alone can not be used as a fitness function. In fact the function underrates hypotheses that are almost consistent and very incomplete. This would lead to a prevalence of fairly large, but very incomplete clauses. This is because the fitness function would prefer large and incomplete clauses to fairly small but almost consistent clauses. In this way the population would become overly general and very inconsistent. For this reason, the function based on information gain is used. This function promotes small and almost consistent clauses. The information gain of a hypothesis H compared to another hypothesis Hdef measures how much information is gained in the distribution of correctly and incorrectly classified positive examples of H compared to the distribution of Hdef . The fitness function based on the information gain uses this gain measure: Gain(Hdef , H, E) = logb (pH + 1) × (Inf o(Hdef , E) − Inf o(H, E)), where b ≥ 1 (default value 1.2) and Inf o(H, E) = H . The hypothesis Hdef is a default hy−log pHp+n H pothesis that classifies all examples as positives. The fitness function based of the information gain is then defined as follows:

fG (H, E) = WG ×

Gain(Hdef ,H,E) Gain(Hdef ,Hmax ,E)

where WG > 0 is a tunable parameter and Hdef is a hypothetical hypothesis that correctly classifies all examples. Finally the fitness function used by DOGMA is the following: fM G = min(fM DL (H, E), fG (H, E)) that chooses the minimum between fM DL and fG . To enhance diversity and to separate different kinds of clauses, the system uses speciation of chromosomes. This can be done randomly or by dividing individuals into species according to which parts of the background knowledge they may use. Speciation has three applications in the system. First it is used for controlling the mating of chromosomes of different species. Secondly, speciation can control what part of the background knowledge individuals can use. Finally, speciation is used when merging chromosomes into families. Chromosomes of the same species cannot be merged in the same family. DOGMA uses the crossover operators used by REGAL, a classical mutation operator, and a seeding operator which, given a randomly selected example, randomly creates a bit string and then adjusts it in order to cover that example. The remaining operators work on the family level. A break operator splits randomly a family into two separate families. In opposition to the break operator, a join operator joins two families into one. If there are two chromosomes of the same species then one of them is deleted. In addition to these operators, a makefamily operator is used for forming families by selecting useful chromosomes of different species from the population. The order in which the operators are applied is given in figure 6. Make-next-generation( P op ) 1 P opS ← Select families in P op 2 P opS 0 ← Mate chromosomes P opS 3 P opX ← CrossoverP opS 0 4 P opM ← Mutate P opX 5 P opB ← Break families P opM 6 P opJ ← Join families P opB 7 P opU ← P opJ ∪ Make families P op 8 P opE ← Evaluate P opU 9 P op0 ← Replace families (P op, P opE ) 10 return P op0

Fig. 6. Algorithm used by DOGMA for the creation of a new population P op0 starting from an old population P op.

Federico Divina / Evolutionary Concept Learning in First Order Logic

DOGMA follows the metaphor of competing families by keeping genetic operators, such as crossover and mutation, working on the lower level, and by building good blocks of chromosomes, while lifting selection and replacement to the family level. Fitness is also lifted to the higher level. 5.5. SIA01 SIA01 (Supervised Inductive Algorithm version 01) [3] uses the sequential covering principle developed in AQ [32]. SIA01 adopts a bottom-up approach. In order to create the initial clause, SIA01 randomly chooses an uncovered positive example and uses it as a seed. Then it finds the best generalization of this clause according to some evaluation criterion. This is done by means of a GA. To obtain a new generation the algorithm applies to each individual in the population a genetic operator, and then the newly created individual may be inserted in the population. The size of the population can grow in this way until a certain bound is reached. A scheme of the GA used for searching the best clause is represented in figure 7. Differently from REGAL, that adopts a bit string representation for encoding clauses, SIA adopts a high level encoding. SIA01 directly uses a FOL notation, by using predicates and their arguments as genes. For instance the clause Obj(X) ← color(X, blue), shape(X, square), f ar(X, Y, 2)will be encoded in the following individual: Obj X color X blue shape X square far X Y 2

The fitness function used takes into consideration the consistency of the clause, its completeness, its syntactic generality and some user’s preferences:  (1 − α − β)CM + αS + βA if CN > 1 − N f (C) = 0 otherwise In the above formula CM is the absolute comp pleteness and it is defined as |Eϕ+ | , where | E + | is the total number of positive examples. CN is the |E − |−n absolute consistency and it is defined as |E − | ϕ where | E − | is the total number of negative examples. N is the maximum noise tolerated, S is the syntactic generality of ϕ and A is the clause’s appropriateness to the user’s preferences. N , A, α and β are user tunable parameters.

13

GA(SIA01) 1 Pop = Seed 2 repeat 3 for ∀ ϕ ∈ Pop 4 do if ϕ has not already produced offspring 5 then create 1 offspring by mutation of ϕ 6 create 2 offspring by crossover with ϕ0 7 put the offspring in P op0 8 for ∀ ϕ ∈ P op0 9 do if ϕ ∈ / P op 10 then if size(P op) < max 11 or fitness ϕ is better 12 than the worst fitness in Pop 13 then insert ϕ in Pop 14 until fitness(best ϕ) hasn’t changed in last n gens Fig. 7. The scheme of the GA adopted by SIA01. ϕ0 is an individual in the population that has not yet produced any offspring.

A mutation operator and two crossover operators are used for creating new individuals. The mutation operator selects a relevant gene and performs one of the following operations: – if the gene encodes a predicate then change it with a more general predicate, according to the background knowledge. If it is not possible to generalize anymore then drop the predicate. For example the predicate P yramid could be changed into P ointed − top without modifying the arguments of the predicate. To this aim, the order of generality among predicates is also stored in B; – if the gene encodes a numerical constant then the mutation can create an interval, or if the gene is already an interval the operator can enlarge it; – if the gene codes a numeric or symbolic constant, then the operator may create an internal disjunction or generalize an existing disjunction; – if the gene codes a symbolic constant this constant can be turn into a variable. This change is reported in the whole individual; – if the gene codes a variable the operator may replace it with another variable. The first crossover, which is used by default, is a restrained one-point crossover. The restriction is that the chosen point in the clause has to be before a predicate. If the seed contains only one predicate then the standard one-point crossover is used. SIA01 is a refinement of the system SIA [55]. Another recently developed system based on SIA

14

Federico Divina / Evolutionary Concept Learning in First Order Logic

GA(ECL) 1 Sel = positive examples 2 repeat 3 Select partial Background Knowledge 4 Pop = ∅ 5 while not terminate 6 do Select n individuals using Sel 7 for each selected individual ϕ 8 do Mutate ϕ 9 Optimize ϕ 10 Insert ϕ in Pop 11 Store Pop in Final Population 12 Sel = Sel - {positive examples covered by Pop} 13 until max iter is reached 14 Extract a solution from Final Population

Fig. 8. The overall learning algorithm ECL.

is the Extended SIA (ESIA) [31]. ESIA adopts a sequential covering algorithm but learns concepts expressed in propositional logic.

Individuals are selected with a variant of the US selection operator (see section 5.2) called Exponential Weighted US (EWUS) [13]. In the EWUS examples difficult to cover have higher probabilities of being chosen. The difficulty of an example ei is determined by the number of individuals that cover ei . Examples are selected with a roulette wheel mechanism, where the dimension of the sector associated to each example is proportional to the difficulty of the example. ECL uses four mutation operators in order to evolve individuals, two for generalization and two for specialization. A clause can be generalized by either deleting an atom from its body or by turning a constant into a variable. With the dual operations a clause can be specialized. Each operator has a degree of greediness, which can be controlled by the user by setting the value of N in the following steps: 1. test N mutation possibilities on C; 2. apply the mutation yielding the best improvement in the fitness to C.

5.6. ECL ECL (Evolutionary Concept Learner) [12,11,14] adopts the Michigan approach. Newly created individuals represent a generalization of a most specific clause built from a positive seed example. In figure 8 a scheme of ECL is given. In the repeat statement the algorithm constructs iteratively a Final Population as the union of max iter populations (line 11). At the end of the evolution a logic program for the target concept is extracted from Final Population. In order to do so, the most precise clauses in Final Population are repeatedly added to the emerging solution until the accuracy of the solution does not decrease. The precision of C a clause is defined as pCp+n . C The fitness of an individual ϕ is given by the inverse of its accuracy: f (ϕ) =

1 Acc(ϕ)

=

|E + |+|E − | pϕ +(|E − |−nϕ )

The aim of ECL is to minimize the fitness of individuals. The representation used is very similar to the one adopted by SIA01. A clause Obj(X) ← color(X, blue), shape(X, square), f ar(X, Y, 2) is encoded by the sequence Obj, X , color, X, blue , shape, X, square , f ar, X, Y, 2

For instance, if the variable Z of the above clause has to be turned into a constant, the system may consider the substitutions {Z/a}, {Z/b}, {Z/c}, where a, b, c are possible values for Z and apply the one yielding the best fitness improvement. When an individual is chosen for mutation, a first (randomized) test decides whether the individual will be generalized or specialized. Next, one of the two operators of the chosen class is randomly applied. If the individual is consistent with the training set, then it is likely that the individual will be generalized. Otherwise it is more probable that a specialization operator will be applied. After mutation, the individual undergoes an optimization phase. This phase consists in the repeated application of the mutation operators until a maximum number of optimization steps has been reached, or until the fitness of the individual does not increase. In the latter case the last mutation is retracted. No crossover operator is used. The particular representation used by ECL makes it difficult to design an effective crossover operator. A variant of the uniform crossover operator has been tried. However the results obtained did not justify its use. The aim of ECL is to find hypotheses of satisfactory quality, both with respect to accuracy and simplicity, in a short amount of time. For this pur-

Federico Divina / Evolutionary Concept Learning in First Order Logic cup(X)

 0 OR



J J C2 C1

J  1 AND

paper cup(X)2



J

J J

3 stable(X)

15

stable(X)

liftable(X)

0 AND C3

0 AND C4





J

J

J 1 bottom(X,Z) flat(Z) 2





J

J

J 1 has(X,Z)

handle(Y)2

liftable(X) 4

Fig. 9. A forest of AND-OR trees. The numbers next to each node are the identifier numbers of the nodes.

pose two mechanisms are used. The first mechanism allows the user to specify the percentage of background knowledge used by the GA (step 3 of figure 8) at each iteration. This is done by using a simple stochastic sampling mechanism: a user tunable parameter p, 0 < p ≤ 1, which determines the probability that each fact of the background knowledge has of being selected. This then lead to the implicit selection of a subset of the training examples. Only examples that can be covered using the chosen part of the background knowledge will be used. Individuals are evaluated using the partial background knowledge. The second mechanism allows the user to control the greediness of the mutation operators, by means of the parameter N , thus controlling the computational cost of the search. Another user defined bias is used to limit the maximum length of a clause. This parameter is also user tunable. 5.7. GLPS The Genetic Logic Programming System (GLPS) [56] is a GP system, that adopts a Pittsburgh approach. The reproduction phase involves selecting a program from the current population of programs and allowing it to survive by copying it into the new population. The selection is based either on fitness or tournament. The system uses crossover to create two offspring from the selected parents. GLPS does not use any mutation operators. After the reproduction and crossover phase, the new generation replaces the old one. Next, GLPS evaluates the population assigning a fitness value to each individual and iterates this process over many generations, until a termination criterion is satisfied. The system adopts a restriction on the representable clauses: function symbols can not appear

in a clause. Logic programs are represented as a forest of AND-OR trees, being the leafs of such trees positive or negative literals containing predicate symbols and terms of the problem domain. For example, figure 9 represent the logic program: C1 : cup(X) ← stable(X), lif table(X). C2 : cup(X) ← paper cup(X). C3 : stable(X) ← bottom(X, Z), f lat(Z). C4 : lif table(X) ← has(X, Y ), handle(Y ). The left most tree in figure 9 represents clauses C1 and C2. In fact it can be derived from the tree that X is a cup if either X is stable and liftable or if X is a paper cup. With this representation, it is not difficult to generate an initial population randomly. A forest of AND-OR trees can be randomly generated and then the leaves of these trees can be filled with literals of the problem. Another way is to generate the initial population using some other systems, like FOIL. The fitness function used by GLPS is a weighted sum of the total number of misclassified positive and negative examples. The weight is used for dealing with uneven distribution of positive and negative examples. An ad-hoc crossover operator is used, which can operate in various modalities: 1. individuals are just copied unchanged to the next generation; 2. individuals exchange a set of clauses; 3. a number of clauses belonging to a particular rule are exchanged between the individuals; 4. a number of literals belonging to a clause are exchanged. 6. Comparison of the Systems In this section we compare the features of REGAL, G-NET, DOGMA, SIA01, ECL and GLPS.

16

Federico Divina / Evolutionary Concept Learning in First Order Logic Table 1 In the table con stands for consistency, com for completeness, sim for simplicity, pref for user’s preferences, gen for syntactic generality. MDL is the Minimum Description Length Principle. Gain is the information gain. Algorithm

Encoding

Fitness features

Approach

REGAL

bit strings (needs an initial template)

con + sim

Michigan Michigan & Pittsburgh

DOGMA

bit strings (needs an initial template)

MDL + Gain

G-NET

bit strings (needs an initial template)

2 functions. con + sim + MDL

Michigan

SIA01

high level language representation

con + com + gen + pref

Michigan

ECL

high level language representation

con + com

Michigan

GLPS

AND-OR trees

con + com

Pittsburgh

We do not consider FOIL and Progol because we are interested in the evolutionary approach to ILP. We try to compare the systems with respect to search strategy, representation, fitness function, operators and biases. Moreover, in section 6.6 we discuss the effectiveness of the systems. Table 1 summarizes the representations of the systems, the properties used in the fitness function and the approach adopted. Table 2 summarizes the genetic and selection operators. 6.1. Search Strategy The described evolutionary systems, in general, do not follow a specific search approach (topdown, bottom-up), except for SIA01, which adopt a bottom-up approach. All the systems exploit the general-to-specific ordering of hypotheses in some of the genetic operators adopted. The search proceeds by successive generalization and specialization of hypotheses. Moreover, co-evolution strategies and speciation is used in REGAL, G-NET and DOGMA, and implicitly in ECL (with the selection). 6.2. Representation REGAL, G-NET and DOGMA use the supplied template to map clauses into bit strings. This implies some knowledge of what the user expects to discover, which cannot be always provided. The use of the initial templates also imposes another limitation. All the rules handled must follow the initial given template, which is constant and cannot change during the learning process. Also with this approach, some problems can arise when dealing with numerical values. In fact the binary repre-

sentation of the model can become quite long and this may slow down the process. The bit string representation used by these three systems does not allow them to perform some operations that are typical FOL operations, e.g., changing a constant into a variable. The high level representation adopted by SIA01 and ECL is more flexible, where the shape of clauses learned can vary during the learning process. In fact the particular shape of each clause is determined by the positive example used as seed in the initialization phase. The two systems are also capable of performing some FOL operations, e.g., applying a substitution to a clause. GLPS does not require an initial template either, so the shape of the initial clauses is not fixed. 6.3. Fitness Function The systems that adopt the simplest fitness functions are GLPS and ECL. They take into consideration only the completeness and the consistency of individuals. The function adopted by SIA01 is more elaborated. In addition simplicity is considered, and the user can express some preferences for some type of clauses. However, consistency and completeness are the features that have the biggest influence on the fitness function. The function used by REGAL considers only simplicity and consistency. Completeness is not considered because the US selection operator already promotes complete individuals. This reduces the complexity of the evaluation. G-NET uses two fitness functions, one used at a local level, in the genetic nodes, and another one used at a global level. G-NET also makes use of the MDL principle in its fitness functions. Unlike REGAL, at a local

Federico Divina / Evolutionary Concept Learning in First Order Logic

17

Table 2 A summary of the characteristics of the various operators adopted by the presented systems. Algorithm

Type of crossover

Mutation

Selection Operator

REGAL

uniform, two-point generalizing, specializing

classic

US

DOGMA

uniform, two-point generalizing, specializing

classic

based on fitness lifted to the family level

G-NET

generalizing, specializing two-point

generalizing specializing,seed

tournament

SIA01

restrained 1-point classic 1-point

4 generalizing modalities

select all individuals that have not produced an offspring

ECL

none

2 generalizing 2 specializing

EWUS

GLPS

reproduction

none

exchange info

level G-NET considers also the global behavior of clauses. This is achieved by evaluating how well a clause combines with others in order to form a global solution. This is a good strategy, since GNET is a distributed system. DOGMA combines the information gain and the MDL principle. Information gain is used for promoting small and almost consistent clauses. This is done because using only the MDL would result in an under rating of hypotheses that are almost consistent but very incomplete. This would lead DOGMA towards a population with a majority of large and very inconsistent clauses. In this way the population would be too general and very inconsistent. 6.4. Operators REGAL and DOGMA adopt the same operators. Two crossovers are used to generalize and specialize individuals. In addition to these, uniform and two-point crossovers are used in order to make bigger steps in the hypothesis search space. G-NET introduces some novel ILP operators. Both the crossover and the mutation operators, can be used in three modalities. For the crossover these modalities are: generalization, specialization and exchanging modality. The latter one is implemented by a classical two-point crossover. The generalization modality tends to be used when both parents are consistent, otherwise the specialization modality is more likely to be applied. For the mutation a similar strategy is applied. With this strat-

tournament or fitness proportionate selection

egy when in a node a clause is often chosen the search turns into a stochastic hill climbing. What is done by these three systems when they generalize or specialize a rule, is basically dropping or adding conditions. This is because the systems adopt an internal disjunction in order to define the values that a variable can assume. SIA01 and ECL have the possibility to perform some operations which are more FOL oriented. The mutation operator used by SIA01 can perform a variety of operations, e.g., changing a numeric constant into an interval or turning a constant into a variable. An interesting feature is that the operators are designed in a way that guarantees that individuals of the population are syntactically different from each other. ECL uses two generalization and two specialization mutation operators. These operators do not act completely at random, but they have a degree of greediness that can be tuned by the user. After an individual has been mutated, an optimization phase is applied to it, as described in section 5.6. Both SIA01 and ECL rely mostly on mutation. This is because the high level representation makes the design of an effective crossover operator difficult. In opposition, GLPS does not make use of any mutation operators, so the reproduction phase is carried out only by the crossover operator, which can exchange rules, clauses or just literals. 6.5. Biases REGAL, DOGMA and G-NET limit the hypothesis space by means of an initial template.

18

Federico Divina / Evolutionary Concept Learning in First Order Logic

Only clauses that can be derived from the initial template are considered during the search of a satisfying concept. ECL and SIA01 consider clauses limited to the possible generalizations of the initial clause that was build starting from a seed example. Therefore, these systems use different biases for each individual, depending on the example that is used for creating the individual. In this way individuals are not constrained into a fixed shape. Moreover ECL limits the hypothesis space with the use of a partial background knowledge. At each iteration the system focuses only on a part of the hypothesis space. GLPS restricts the search to individuals than can be represented with trees having a maximum depth specified by a user tunable parameter. 6.6. Effectiveness Unfortunately some of the described systems are either not available (DOGMA, GLPS, SIA01) or not installable (REGAL2 , G-NET). For this reason, we can only provide a brief comparison of the performance of the evolutionary systems, based on results taken from various publications. Results discussed in this section are obtained on the datasets showed in table 3, where the features of the various datasets are also given. The first column specifies the kind of dataset, where Prop. stands for propositional. The second columns shows the number of examples for each dataset, while the third column shows the background knowledge size, i.e., the number of facts describing the examples. The crx, breast, splicejunction and vote datasets are taken from [7], while the mutagenesis dataset originates from [10]. The mutagenesis and the splice-junction are FOL datasets, while the others are propositional. Table 4 shows the accuracies obtained by the systems on the datasets on which they were tested, as estimated by 10–fold cross validation. It is interesting to compare the performance of G-NET, REGAL and DOGMA, since these systems are based on the ideas first developed in REGAL. The three systems were tested on the splicejunction dataset. The obtained solutions showed an accuracy of 96.6%, 95.6% and 94.3% respectively. G-NET not only improves the accuracy, but 2 Special thanks to Filippo Neri for his assistance in the attempt to install the system

Table 3 Features of the datasets. Dataset

Type

Examples

Background

Crx

Prop.

690

10283

Breast

Prop.

699

6275

Vote

Prop.

435

6568

Splice Junctions

ILP

3190

191400

Mutagenesis

ILP

188

13125

also the number of disjunctions needed for the concept decreased. We were not able to find any results for SIA01. We could only find some results for ESIA. We compare the effectiveness of ESIA with G-NET and ECL on two datasets, the crx and the breast dataset. On the first dataset ESIA obtained an accuracy of 77.4, while G-NET was capable of an accuracy of 84.4% and ECL of 84%. On the breast dataset the three systems obtained the same accuracy, namely of 94.7%. ECL and G-NET were also compared on another two datasets: mutagenesis and vote. For the first dataset we propose also results obtained by Progol. G-NET evolved a hypothesis of accuracy 91.2% for the mutagenesis dataset, ECL of 90.3% while Progol was capable of finding a theory of accuracy 88.2%. G-NET employed several hours for evolving a theory for the mutagenesis dataset, while ECL was more efficient, employing an average of ten minutes. For the vote dataset the accuracy obtained by G-NET is 95% and the accuracy of ECL is 94%. GLPS is directly comparable only to a version of FOIL when learning concepts from noisy data, where it performed better. However the initial population was initialized with FOIL. Table 4 Results obtained by the systems on different datasets. Average accuracy is given. Algorithm

Splice

Crx

Breast

Vote

Mutagenesis

G-NET

96.6

84.4

94.7

95

91.2

ECL

-

84

94.7

94

90.3

REGAL

95.6

-

-

-

-

DOGMA

94.3

-

-

-

-

ESIA

-

77.4

94.7

-

-

Progol

-

-

-

-

88.2

It is difficult, after this kind of comparison, to come out with some conclusion about the effec-

Federico Divina / Evolutionary Concept Learning in First Order Logic

tiveness of the various systems. We can say that G-NET and ECL showed a similar behavior on the datasets they were tested on, with G-NET being capable of obtaining slightly more accurate results, but taking greater computational power. Due to the limitations of the comparison proposed in this section, the reported results cannot be used to empirically prove that methods based on EC are superior to traditional ILP methods. Nor can the results be used to show that EC methods can always overcome the limitations of traditional methods (e.g., get stuck in local optima). However, the exploration operators used by EAs, e.g., randomized mutation, give EAs the capability of escaping from local optima and of effectively explore the hypothesis space. On the other hand, the same exploration operators are also responsible for the rather poor performances of EAs on learning tasks which are easy to tackle by algorithms that use specific search strategies. This is due to the fact that EAs are not as good as greedy search strategies at fine tuning solutions. From these considerations, we can affirm that EAs represent a valid alternative to standard ILP techniques, even if also EAs are characterized by some limitations. These observations suggest that the two approaches (standard ILP techniques and EAs) are applicable to partly complementary classes of learning problems. They also suggest that a system incorporating features from both approaches could profit from the different benefits of the approaches.

7. Discussion In the following section we summarize the most promising solutions adopted by the evolutionary systems considered in this paper, for each aspect given in figure 1. This can provide a convenient reference for future research and development of EA based ILP systems. Representation A good choice seems to be an high level representation, like the ones adopted by SIA01 and ECL. This kind of representation has the advantage of a direct way to define operators for generalizing and specializing a rule, and allows a more flexible form of the rules. Another advantage of an high level representation is that with such a representation it is easy to have individuals of variable length.

19

Fitness function The choice of a fitness function is a difficult one. The particular fitness function used depends on various aspects, e.g., on the operators used or on the search strategy adopted. We have seen that almost all the systems try to incorporate at least the consistency and completeness as well as simplicity. Operators By using a high level representation it seems possible to obtain good results by using only mutation operators. However relying only on standard mutation operators is not a good choice. Some ILP oriented operators, like those used by SIA01 and ECL, need to be adopted. Moreover, enhancing the GA search by embedding local optimization procedures has been shown to be effective also in ILP, as it is in general for combinatorial optimization problems [35]. Search EA systems for ILP use selection as the main search strategy. The US selection for parallel systems like REGAL and the EWUS for any systems like ECL, seem to be a very effective mechanism for guiding the GA during its search process, by promoting speciation and diversity in the population. Several other methods for niches and species formation can be found in the literature [25,28,51,30]. Biases Search biases like setting a bound on the length of rules or limiting the kind of representable rules are adopted by almost every ILP system in order to limit the size of the hypothesis space to search. Parallelization REGAL and G-NET use interesting parallel models. The particular implementation adopted allows the systems to shift the focus of the genetic search performed by each genetic node. In this way the effectiveness and the efficiency of the systems are increased and diversity is promoted. Sampling ECL sampling method helps in reducing the computational cost and also in improving the accuracy. Other mechanisms for reducing the computational cost have been proposed. In [49,22,53] rules are evaluated on a subset of the training examples determined by a sampling method.

20

Federico Divina / Evolutionary Concept Learning in First Order Logic

8. Conclusions The aim of this paper was to provide an overview of state of the art evolutionary techniques for ILP. Even though the focus of this paper was on the evolutionary approach to ILP, we provided a brief description of two non-evolutionary systems: FOIL and Progol. These two systems were included due to their popularity in the ILP community and because they have proved to be successful in solving many ILP problems. When presenting the evolutionary systems, we have particularly focused our attention on the description of the following aspect: encoding, fitness function, genetic operators and biases on the hypothesis space. Particular attention when designing an EA for ILP has to be put both on the encoding of Horn clauses and on the choice of the particular fitness function adopted. A wrong choice of the encoding may preclude the EA to search effectively the hypothesis space, e.g., because some clauses cannot be encoded. The particular encoding adopted will also influence the choice of the variation operators adopted. For example, the encoding used by REGAL allows the system to adopt standard crossover and mutation operator, while in ECL and SIAO1 ad-hoc operators must be designed. However this effort allows the last two systems to perform more ILP oriented operation, e.g., turning a particular variable into a particular constant. The fitness function used is of crucial importance. This is not true just for EAs used in the ILP context, but in general for all EAs. In fact, it is the fitness function that guide the evolutionary search toward determined areas of the hypotheses space. The various variation operators will only help to reach the goal defined by fitness function, i.e., the evolutionary algorithm might need much more time to converge. In order to test the effectiveness of the evolutionary systems presented in this paper, a brief comparison based on results taken from various publications was proposed. The aim of this comparison was only to give a general idea of the performances of the evolutionary systems. The limitations of the comparison do not allow to draw any significant conclusions on the differences of performances of the various EAs. From the experimentation pre-

sented, it is also not possible to claim that EC methods for ILP always overcomes the limitations of traditional techniques for ILP. This paper can represent an useful tool for someone who needs an overview of the state of the art of evolutionary methods for ILP and can provide some useful hints to someone who intends to build a system for the ILP task.

Acknowledgments The author would like to thank Elena Marchiori for her precious help in the revision process of this paper. Thanks also go to Sophie Kain for correcting some language-related mistakes. The author would also like to thank the anonymous reviewers for their suggestions that helped him improving the paper.

References [1] E. Alphonse and R. C. Lazy propositionalisation for relational learning. In H. W., editor, 14th European Conference on Artificial Intelligence, (ECAI’00) Berlin, Allemagne, pages 256–260. IOS Press, 2000. [2] C. Anglano, A. Giordana, G. L. Bello, and L. Saitta. An experimental evaluation of coevolutive concept learning. In Proc. 15th International Conf. on Machine Learning, pages 19–27. Morgan Kaufmann, San Francisco, CA, 1998. [3] S. Augier, G. Venturini, and Y. Kodratoff. Learning first order logic rules with a genetic algorithm. In U. M. Fayyad and R. Uthurusamy, editors, The First International Conference on Knowledge Discovery and Data Mining, pages 21–26, Montreal, Canada, 20-21 1995. AAAI Press. [4] R. Axelrod. The evolution of strategies in the iterated prisoner’s dilemma. In Genetic Algorithms and Simulated Annealing, Research Notes in Artificial Intelligence, pages 32–41. Morgan Kaufmann, 1987. [5] T. B¨ ack, D. B. Fogel, and Z. Michalewicz. Evolutionary Computation 1: Basic Algorithms and Operators. Institute of Physics Publishing, 2000. [6] J. Barwise and J. Etchemendy. The Language of FirstOrder Logic. Center for the Study of Language and Information - Lecture Notes, 1992. [7] C. Blake and C. Merz. UCI repository of machine learning databases, 1998. [8] A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Occam’s razor. Information Processing Letters, 24(6):377–380, Apr. 1987.

Federico Divina / Evolutionary Concept Learning in First Order Logic

21

[9] K. De Jong. An Analysis of the Behaviour of a Class of Genetic Adaptive Systems. PhD thesis, Dept. of Computer and Communication Sciences, University of Michigan, Ann Arbor, MI, 1975.

[24] D. E. Goldberg and J. Richardson. Genetic algorithms with sharing for multimodal function optimization. In Proc. of 2nd Int’l Conf. on Genetic Algorithms, pages 41–49. Morgan Kaufmann Publishers, 1987.

[10] A. Debnath, R. L. de Compadre, G. Debnath, A. Schusterman, and C. Hansch. Structure-Activity Relationship of Mutagenic Aromatic and Heteroaromatic Nitro Compounds. Correlation with molecular orbital energies and hydrophobicity. Journal of Medical Chemistry, 34(2):786–797, 1991.

[25] D. E. Goldberg and J. Richardson. Genetic algorithms with sharing for multimodal function optimization. In Genetic Algorithms and their Applications (ICGA’87), pages 41–49, Grefenstette, 1987. Proc. 2nd Int. Conf. on Genetic Algorithms, Morgan kaufmann (Cambringe,MA), J.J.(ed).

[11] F. Divina. Hybrid Genetic Relational Search for Inductive Learning. PhD thesis, Department of Computer Science, Vrije Universiteit, Amsterdam, the Netherlands, 2004.

[26] J. Hekanaho. Background knowledge in GA-based concept learning. In International Conference on Machine Learning, pages 234–242, 1996.

[12] F. Divina and E. Marchiori. Evolutionary concept learning. In GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 343–350, New York, 9-13 July 2002. Morgan Kaufmann Publishers. [13] F. Divina and E. Marchiori. Knowledge–based evolutionary search for inductive concept learning. In Knowledge Incorporation in Evolutionary Computation, chapter III, pages 237–254. Springer-Verlag, 2004. [14] F. Divina and E. Marchiori. Handling continuous attributes in an evolutionary inductive learner. IEEE Transactions on Evolutionary Computation, 9(1):31– 43, February 2005. [15] D. Douguet, E. Thoreau, and G. Grassy. A genetic algorithm for the automated generation of small organic molecules: Drug design using an evolutionary algorithm. Journal of Computer-Aided Molecular Design, 14(5):449–466, 2000. [16] S. Dzeroski and N. Lavrac, editors. Relational Data Mining. Springer-Verlag, 2001. [17] A. E. Eiben and J. E. Smith. Introduction to Evolutionary Computing. Springer-Verlag, 2003. [18] A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Spinger-Verlag, Berlin, 2002. [19] A. Giordana and F. Neri. Search-intensive concept induction. Evolutionary Computation Journal, 3(4):375– 416, 1996. [20] R. Gir´ aldez and J. S. Aguilar-Ruiz. Feature influence for evolutionary learning. In Genetic and Evolutionary Computation – GECCO-2005, pages 1139–1145. ACM Press, 2005. [21] R. Gir´ aldez, J. S. Aguilar-Ruiz, and J. C. Riquelme. Knowledge-based fast evaluation for evolutionary learning. IEEE Transactions on Systems, Man & Cybernetics: Part C – Special Issue on Knowledge Extraction and Incorporation in Evolutionary Computation, (35)(2):254–261, 2005. [22] R. Glover and P. Sharpe. Efficient GA based techniques classification. Applied Intelligence, 11(3):277– 289, 1999. [23] D. E. Goldberg. Genetic Algorithms in search, Optimization and Machine Learning. Addison Wesley, 1989.

[27] J. Hekanaho. DOGMA: a GA based relational learner. In D. Page, editor, Proceedings of the 8th International Conference on Inductive Logic Programming, LNAI 1446, pages 205–214. Springer Verlag, 1998. [28] J. Horn, D. E. Goldberg, and K. Deb. Implicit Niching in a Learning Classifier System: Nature’s Way. Evolutionary Computation, 2(1):37–66, 1994. [29] M. Krogel, S. Rawles, F. Zelezny, P. Flach, N. Lavrac, and S. Wrobel. Comparative evaluation of approaches to propositionalization. In Springer-Verlag, editor, 13th Int. Conference on Inductive Logic Programming, pages 194–217, 2003. [30] J.-P. Li, M. E. Balazs, G. T. Parks, and P. J. Clarkson. A Species Conserving Genetic Algorithm for Multimodal Function Optimization. Evolutionary Computation, 10(3):207–234, 2002. [31] J. J. Liu and J. T.-Y. Kwok. An extended genetic rule induction algorithm. In Proc. of the 2000 Congress on Evolutionary Computation, pages 458–463, Piscataway, NJ, 2000. IEEE Service Center. [32] R. Michalkski, I. Mozetic, J. Hong, and N. Lavraˇ c. The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proc. Fifth National Conference on Artificial Intelligence, pages 1041–1045. American Association for Artificial Intelligence (Philadelphia, PA), 1986. [33] R. Michalski, J. Carbonell, and T. Mitchell. A theory and methodology of inductive learning. In Machine Learning: An AI Approach, volume I, pages 83–134. Morgan Kaufmann, Los Altos, CA, 1983. [34] T. M. Mitchell. Generalization as search. Artificial Intelligence, 18(2):203–226, 1982. [35] P. Moscato. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Technical Report C3P 826, Pasadena, CA, 1989. [36] S. Muggleton. Inverse entailment and Progol. New Generation Computing, Special issue on Inductive Logic Progra mming, 13(3-4):245–286, 1995. [37] S. Muggleton. Learning from positive data. In S. Muggleton, editor, Proceedings of the 6th International Workshop on Inductive Logic Programming, pages 225–244. Stockholm University, Royal Institute of Technology, 1996.

22

Federico Divina / Evolutionary Concept Learning in First Order Logic

[38] S. Muggleton. Inductive logic programming: issues, results and the challenge of learning language in logic. Artificial Intelligence, 114:283–296, 1999. [39] S. Muggleton and W. Buntine. Machine invention of first order predicates by inverting resolution. In Proceedings of the Fifth International Machine Learning Conference, pages 339–351. Morgan Kaufmann, 1988. [40] S. Muggleton and L. D. Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19-20:669–679, 1994. [41] C. N´ edellec, C. Rouveirol, H. Ad´ e, F. Bergadano, and B. Tausend. Declarative bias in ILP. In L. De Raedt, editor, Advances in Inductive Logic Programming, pages 82–103. IOS Press, 1996. [42] F. Neri and L. Saitta. Analysis of genetic algorithms evolution under pure selection. In Proceedings of the Sixth International Conference on Genetic Algorithms, pages 32–39. Morgan Kaufmann, San Francisco, CA, 1995. [43] S.-H. Nienhuys-Cheng, R. D. Wolf, R. de Wolf, J. Siekmann, and J. G. Carbonell. Foundations of Inductive Logic Programming. Springer-Verlag, 1997. [44] J. Quinlan. Learning logical definition from relations. Machine Learning, 5:239–266, 1990. [45] J. R. Quinlan. Induction of decision trees. In J. W. Shavlik and T. G. Dietterich, editors, Readings in Machine Learning. Morgan Kaufmann, 1986. Originally published in Machine Learning 1:81–106, 1986. [46] P. G. Reiser and P. J. Riddle. Evolution of logic programs: Part-of-speech tagging. In P. J. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, and A. Zalzala, editors, Proceedings of the Congress of Evolutionary Computation, volume 2, pages 1338–1346, Mayflower Hotel, Washington D.C., USA, 6-9 July 1999. IEEE Press. [47] J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific, River Edge, NJ, 1989. [48] S. Russel and P. Norvig. Artificial Intelligence - A modern approch. Englewood Cliffs, NJ: Prentice-Hall, 1995. [49] M. Sebag and C. Rouveirol. Tractable induction and classification in first-order logic via stochastic matching. In Proceedings of the 15th International Joint Conference on Artificial Intelligence, pages 888–893. Morgan Kaufmann, 1997. [50] R. M. Smullyan. First-Order Logic. Dover Publications, Inc., 1995. [51] W. M. Spears. Simple subpopulation schemes. In Proceedings of the Third Annual Conference on Evolutionary Programming, pages 296–307, San Diego, CA, 1994. World Scientific. [52] A. Tamaddoni-Nezhad and S. Muggleton. Using genetic algorithms for learning clauses in first-order logic. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO-2001, pages 639–646, San Francisco, CA, 2001. Morgan Kaufmann Publishers.

[53] A. Teller and D. Andre. Automatically choosing the number of fitness cases: The rational allocation of trials. In J. R. Koza, K. Deb, M. Dorigo, D. B. Fogel, M. Garzon, H. Iba, and R. L. Riolo, editors, Genetic Programming 1997: Proceedings of the Second Annual Conference, pages 321–328, Stanford University, CA, USA, 13-16 July 1997. Morgan Kaufmann. [54] K. Tor and F. E. Ritter. Using a genetic algorithm to optimize the fit of cognitive models. In Sixth International Conference on Cognitive Modeling, pages 308–313. Mahwah, 2004. [55] G. Venturini. Sia: A supervised inductive algorithm with genetic search for learning attributes based concepts. In P. B. Brazdil, editor, Machine Learning: ECML-93 - Proc. of the European Conference on Machine Learning, pages 280–296, Berlin, Heidelberg, 1993. Springer. [56] M. L. Wong and K. S. Leung. Inducing logic programs with genetic algorithms: The genetic logic programming system. In IEEE Expert 10(5), pages 68–76, 1995. [57] X. Yao. Evolutionary computation: A gentle introduction, chapter 2, pages 27–53. Kluwer Academic Publishers, 2002.

Suggest Documents