Improving Accuracy of Incorrect Domain Theories - CiteSeerX

6 downloads 73319 Views 169KB Size Report
uses the available domain theory, that might ... nes overly general and overly speci c domain theories ... the training examples in order to direct the search for.
Improving Accuracy of Incorrect Domain Theories Lars Asker Department of Computer and Systems Sciences Stockholm University and the Royal Institute of Technology, Electrum 230, S-164 40 Kista, SWEDEN [email protected]

Abstract An approach to improve accuracy of incorrect domain theories is presented that learns concept descriptions from positive and negative examples of the concept. The method uses the available domain theory, that might be both overly general and overly speci c, to group training examples before attempting concept induction. gentre is a system that has been implemented to test the performance of the method. gentre is not limited to variable-free, function-free or nonrecursive domains as many other approaches. In the paper we present results from experiments in three di erent domains and compare the performance of gentre with that of ID3 and IOU. The learned concept descriptions are consistent with training examples and have an improved classi cation accuracy relative to the original domain theory.

1 INTRODUCTION Many knowledge intensive techniques require that the application domain is well understood and can be expressed in the form of correct rules of a domain theory. For many real world applications however, this is not the case. To overcome this problem, several attempts have been made to combine explanation-based learning [12], [9] with empirical [19], [11], or sub-symbolic approaches. Examples of systems based on these hybrid approaches are IOU [13], [14], EITHER [15], IOE [10], KBANN [22], FOCL [16], ML-SMART [5], and A-EBL [7]. Yet other systems are described in [8], [2], [1] and [20]. Many approaches are limited in that they are only capable of specializing an overly general theory. IOU, IOE and A-EBL all fall into this category. Other systems are only able to generalize an overly speci c theory [8], [2], [1]. Some other limiting assumptions are

for example that each failure can be referred to a single fault [8], [2], or that no intermediate concepts are missing from the domain theory [1]. Another limitation is the expressiveness of the language used to describe the theory. EITHER and KBANN for example, are able to learn only a propositional domain theory. Richards and Mooney's FORTE system does not allow recursion or negation in its theories [20]. In the paper we present a method to re ne Horn clause theories containing variables, functions and recursive rules. The term clause and rule will be used interchangeably to denote a Horn clause. The method re nes overly general and overly speci c domain theories to create concept descriptions that are consistent with training examples, and give improved classi cation accuracy relative to the original domain theory. The novel contribution of the method is twofold: rstly, the way in which unexplained examples are grouped before any inductive learning takes place, secondly, the way an incomplete domain theory is re ned by using Plotkin's notion of least general generalization (lgg) [18] on these groups.

2 KNOWLEDGE INTENSIVE THEORY REFINEMENT The intuition behind the method is that information present in a domain theory can be used to analyse and group examples based on shared similarities before inductive learning takes place. Generalization of rules is done by dropping conditions from overly speci c rules, or by the addition of new rules. The specialization of overly general rules on the other hand, is directed by subregularities in positive examples covered by the same set of generalized rules. This is done by dividing positive examples into groups where all examples in a group are covered by exactly the same set of generalized rules. The reason for doing this is the assumption that more similar examples are more likely to share some properties that are presently

not re ected in the concept description. One way to extract and make this information explicit is to form groups of examples from which these similarities can be extracted, and then form a description that covers all examples in each such group. At present this is done by forming a least general generalization of all examples in a group, but this could also be done in other ways, using any inductive learning technique. Most systems designed to handle imperfect domain theories create one or more partial explanations for the training examples in order to direct the search for possible corrections to the domain theory. The technique used in gentre to construct generalized rules by removing literals from the body of the rule (which is a development of a technique described in [3]), can be thought of as the construction of partial explanations, and is similar to the one used in AxA-EBL [6].

3 A DESCRIPTION OF GENTRE 1 Gentre is a system for revision and re nement of domain theories (Gentre is described in more detail in [4]). It takes as input a set of positive and negative examples of a target concept and an original (possibly incorrect) domain theory expressed in Horn clause form, and produces as output a new domain theory with an improved classi cation accuracy relative to the target concept. gentre re nes domain theories that are both overly general (so that they cover negative examples), and overly speci c (so that they exclude positive examples). gentre's basic algorithm is described in Figure 1.2 Due to space limitations, some of the subroutines such as EBG, FINDCOVER and MAKERULE will not be described in pseudo code in this paper. As a rst step, rules representing operationalized, suf cient, but not necessary, descriptions of a target concept are extracted from an original (incorrect) domain theory and correctly classi ed examples of a target concept using explanation-based learning (EBL). In doing this, all proofs that the current domain theory can support, are created for each positive example. The set of learned rules is then evaluated against all the negative examples. Each rule that covers some negative example is classi ed as \bad", and each that covers only positive examples is classi ed as \good". Each \good" rule is then generalized by removing combinations of literals from the body of the rule. From the set of generalizations, a minimal subset is selected consisting of the most general generalizations that cover as many as possible of the positive exam1

gentre was written in SICStus Prolog on a Sun Sparc

Station SLC 2 The function used in Figures 1 and 3 returns the set of examples that are covered by (or provable by) a certain rule, or set of rules. ( (rule) = fe : rule j= eg)

Algorithm GEN T RE (S + ; S 0 ; C; O; DTOLD ) : Input: a set of positive (S + ) and negative (S 0 ) examples of a concept (C ) an operationality criterion (O) a domain theory (DTOLD ) de ning concept C Output: a re ned domain theory (DTNEW ) that is consistent with the training examples begin DTNEW RALL RB ; RGEN

; ;

;

(some initializations)

for each positive example e 2 S + do RALL RALL [ EBG(e; C; DTOLD ; O)

( nd all possible proofs of all positive examples)

repeat call EV ALU AT E with RALL; S 0 as input giving RGOOD and RBAD as output RB

(separate the rules that covers only positive examples from the rest) RB [ RBAD

(add to the bad rules those rules that covered some negative examples) for each rule r 2 RGOOD do RGEN RGEN [ GEN ERALIZE (r; S 0 ) (generalize the rules that cover only positive examples) DTNEW DTNEW [ F IN DCOV ER(RGEN ; S + ) ( nd an approximate minimal cover for as many positive examples as possible) S+ S + 0 (DTNEW ) (remove from S + , the positive examples that are covered by DTNEW ) RALL SP ECIALIZE (RB ; S + ) (specialize the bad rules) until (RALL = ;) or (S + = ;)

return DTNEW

return the new domain theory

end Figure 1: GENTRE's basic algorithm

Algorithm EV ALU AT E (R; S 0 ) : Algorithm GEN ERALIZE (r; S 0 ) : begin begin RGOOD ; RALL the set of rules given by removing all RBAD ; combinations of literals (together with their for each rule r 2 R do begin dependent literals) from the body of r. if ((9e) ^ (e 2 S 0 ) ^ (e 2 (r))) then call EV ALU AT E with RALL; S 0 as input RBAD RBAD + r giving RGOOD and RBAD as output else (separate the rules that covers only RGOOD RGOOD + r positive examples from the rest) end return RGOOD return RBAD ; RGOOD end end Figure 2: GENTRE's generalization algorithm

Figure 3: GENTRE's evaluation algorithm

ples without covering any negative training examples. When the subset has been found, the positive examples coverd by it are removed from the training set. After that, the rest of the generalizations, including those covering some negative examples, are used to group the remaining positive examples. This is done in such a way that all examples in a group are covered by exactly the same set of generalizations. From each such group, a new rule for the target concept is created by nding a least general generalization (lgg) [18] of all the examples in the group. The new rule is then evaluated and processed in the same way as the rules learned in the rst step described above. The process stops when all the positive training examples have been covered by some rules, or no more improvement can be achieved on the training set.

sented in gure 3). If two generalizations cover the same set of positive examples, or if one generalization covers a subset of the positive examples that another generalization covers, the more speci c generalization will be removed. Since the generalization procedure is only applied to rules that do not cover any negative examples, and since more general generalizations are preferred to more speci c, it will always return a set of generalizations, (in the worst case containing the rule itself) that are as general as possible without covering any negative training examples. In a nal reduction, generalizations covering positive examples that are covered by a combination of more general rules, are removed from the set. A possible way to optimize the generalization process is to evaluate each generalization against the negative training exemples before further generalization is done. In this way, as soon as a generalization covers some negative examples, no further generalization is needed. In the implemented version of gentre we are simply creating all possible generalizations before they are tested. Inputs to the generalization process are either \good" rules created from using EBL and the original domain theory on positive examples, or \good" rules created as a result of the specialization process described in the following section.

3.1 GENERALIZATION IN GENTRE The generalization algorithm is described in Figure 2. A rule is generalized by removing one or several of the literals in the body of the clause. A clause with n literals in its body will generate 2n 0 1 generalizations. In a second step, literals in the body of each generalization, whose instansiation are dependent on the removed literal, will also be removed. The generalized rules can also be viewed as partial explanations for the concept that they describe, or they can be viewed as approximations as in [6]. This corresponds to a top down construction of partial explanations rather than a bottom up. The unexplained training examples do not trigger the construction of partial explanations. Instead, partial explanations are generated from the explanations of successfully processed examples, which are then matched against the unexplained training examples to see which are applicable. Each of the generalizations is evaluated on the training set and those that cover some negative examples are removed (gentre's evaluation algorithm is pre-

3.2 SPECIALIZATION IN GENTRE gentre's specialization algorithm is found in Figure 4. A rule is specialized by nding combinations of groups of generalized rules (partial explanations), and groups of positive examples that they cover. What is common for all the generalized rules in such a group is that they cover all of the examples in the group and that the particular combination of rules does not cover any other positive examples in the training set. Because the examples in the group have in common that they are all covered by all the partial explanations in the group, there is a reason to believe that they might also

Algorithm SP ECIALIZE (RBAD ; S + ) : begin R R1

; ;

for each positive example e 2 S + do begin for each rule r 2 RBAD do if (e 2 (r)) then R1 R1 [ frg

(R1 are all the rules that cover example e) all examples in S + that are covered by all rules in R1 (SE is the set of examples that are covered by all rules in R1) newrule CREAT ERU LE (R1; LGG(SE )) (newrule is created from the LGG of all examples in SE together with all the rules in R1) SE

R

R [ fnewruleg

bad_credit(X) :discredit_bad_region(X). bad_credit(X) :jobless_unmatch_fem_reject(X). bad_credit(X) :jobless_unmarried_fem_reject(X). bad_credit(X) :jobless_mascu_reject(X).

(the new rule is added to the set of rules)

jobless_mascu_reject(X) :jobless(X), male(X).

(return the set of new rules)

jobless_unmarried_fem_reject(X) :jobless(X), female(X), unmarried(X).

end return R end

bad_credit(X) :rejected_aged_unstable_work(X).

Figure 4: GENTRE's specialization algorithm

share some other properties that are speci c for this group, and that might be su ccient to distinguish the group from other examples, (i.e. negative examples). It is this assumption that is the ground for grouping the examples before any induction is done. Each such group of examples is described by nding a least general generalization (lgg) [18], of all examples in the group. The literals in the body of the lgg that are not present in the body of the partial explanation, are then added as preconditions to specialize the partial explanation. The reason that we do not simply use the lgg as the body of the new rule is that there might be operational literals present in the body of the partial explanation that are not present in the lgg. The lgg will be (equal to or) more general than each of the examples in the group, but (equal to or) more speci c than the combination of all of the preconditions of the rules in the group. A new rule, which will be a specialization of all of the rules in the group is then created from the lgg together with the set of rules. If the new rule, which will be guaranteed to cover at least some positive examples, does not cover any negative examples, it will be generalized as described in the previous section.

jobless_unmatch_fem_reject(X) :jobless(X), female(X), married(X), unmatch_fem(X). unmatch_fem(X) :- female(X), purchase_item(X, bike). unmatch_fem(X) :female(X), deposit(X, M), monthly_payment(X,MP), numb_of_months(X,NM), Sum is MP * NM, Sum > M. rejected_aged_unstable_work(X) :age(X,N), 59 < N, numb_of_years_in_company(X,Y), Y < 3. discredit_bad_region(X) :problematic_region(X), numb_of_years_in_company(X,Y), Y < 11.

Figure 5: The Domain Theory of the Japanese Credit Screening Database

bad_credit(A):- jobless(A), unmarried(A). bad_credit(A):- female(A), purchase_item(A,bike). bad_credit(A) :- jobless(A), male(A). bad_credit(A) :problematic_region(A), purchase_item(A, stereo). bad_credit(B) :problematic_region(B), numb_of_years_in_company(B, A M. rejected_aged_unstable_work(X) :age(X,N), 59 < N, numb_of_years_in_company(X,Y), Y < 3. discredit_bad_region(X) :problematic_region(X), numb_of_years_in_company(X,Y), Y < 11.

Figure 6: Example of new domain theory of the Japanese Credit Screening Database

0

20

40

60

80

Figure 7: Results from the Japanese Credit Screening Domain

4 EXPERIMENTAL EVALUATION 4.1 EXPERIMENT 1: \CREDIT" The domain theory in this experiment was generated by talking to individuals at a Japanese company that grants credit. It was donated to the repository of machine learning databases at UCI by Chiharu Sano. Examples represent a total of 125 positive and negative instances of people who were and were not granted credit. The original domain theory was not sucient to classify all positive training examples correctly. Furthermore, it incorrectly classi ed some of the negative training examples as instances of the target concept. The domain theory has been translated from Lisp to standard Prolog notation (Figure 5). In the experiment we used a test set consisting of 10 positive and 10 negative randomly selected examples, and a training set varying from 5 to 80 examples randomly selected from the total set of examples. Figure 7 shows the results when gentre was used to reformulate this theory. Each point is averaged over 20 runs. The result is compared with the accuracy of the original domain theory, and with the result of running ID3 on the examples. The vertical axis represents % accuracy, and the horizontal axis represents number of training examples. The original domain theory classi ed 71.9% of the test examples correctly. ID3 started on a low 35% accuracy with 5 training examples, reached about 45% with a training set of 40 examples, and reached 50% accuracy after 80 training examples. gentre started on 62%

never_left_school(Student) :longest_absense_from_school(Student,Units). % 6 > Units. removed % ========

90 GENTRE 80

ID3

enrolled_in_more_than_n_units(Student,N) :enrolled(Student,School,Units), % school(School), removed % ======== Units > N.

DT

70

60

no_payment_due(Student) :enlist(Student,fire_department), % extra % ======== continuously_enrolled(Student). no_payment_due(Student) :eligible_for_deferment(Student).

50

40

30 0

200

400

600

800

1000

Figure 8: Results from the Students Domain

continuously_enrolled(Student) :% never_left_school(Student), removed % ======== enrolled_in_more_than_n_units(Student,5).

accuracy after 5 training examples, reached 75.5% accuracy with 20 training examples, 79% with 40 training examples , and reached 80% accuracy with a training set of 80 examples. Figure 6 shows one example of the resulting theory. In this run gentre had been given 40 training examples. The new theory which classi ed all 40 training examples correctly, had 77.65 % accuracy on all the unseen examples.

eligible_for_deferment(Student) :military_deferment(Student). eligible_for_deferment(Student) :peace_corps_deferment(Student). eligible_for_deferment(Student) :financial_deferment(Student). eligible_for_deferment(Student) :student_deferment(Student). eligible_for_deferment(Student) :disability_deferment(Student).

4.2 EXPERIMENT 2: \STUDENT LOAN"

military_deferment(Student) :enlist(Student,Org), armed_forces(Org).

In a second experiment, we used the \Student Loan Relational Domain" from the machine learning databases at UCI. This domain was donated by Mike Pazzani, and has previously been used in [17]. In this case, there were a larger set of training examples available, 643 positive instances of the relation no_payment_due(Person), and 357 negative instances corresponding to the relation payment_due(Person), totalling to 1000. The predicate no_payment_due/1 is true for those people who are not required to repay a student loan. Auxiliary relations can be used to fully discriminate positive from negative instances of no_payment_due/1. Closed world assumption applies to all auxiliary relations. The domain contains a domain theory, which classi es all examples correctly. In order to make the domain suitable for experimenting, we corrupted the domain theory by adding an extra literal to four of the rules, and removing a literal from three rules. The corrupted domain theory is shown in Figure 9. This resulted in an accuracy drop from 100% to 69.8%. Starting with an accuracy of 67% accuracy after 10 examples, gentre re ned the domain theory

peace_corps_deferment(Student) :enlist(Student,Org), peace_corps(Org). financial_deferment(Student) :male(Student), % filed_for_bankrupcy(Student). financial_deferment(Student) :unemployed(Student).

% extra ========

student_deferment(Student) :male(Student), % extra % ======== enrolled_in_more_than_n_units(Student,11). disability_deferment(Student) :male(Student), % disabled(Student).

% extra ========

Figure 9: The Corrupted Domain Theory of the Student Loan Relational Domain

100

100

IOU ID3

90

GENTRE

GENTRE

95

90

80 DT 70

85

60

80

50

75

DT

70

40 0

8

16

24

32

0

8

16

24

32

Figure 10: Results from the Cup Domain 1

Figure 11: Results from the Cup Domain 2

to give a peak performance of 81.5% using 60 examples. The result of the experiment, which is averaged over 40 runs, is presented in Figure 8. One reason why gentre could not reach a higher accuracy is that it is not able to learn restrictions on numerical values. One of the literals that was removed from one of the rules was a restriction on a numeric value, which when removed, resulted in a loss of accuracy of 12.9%. ID3 was not able to work without modi cation in this domain, because ID3 can only accept one value for each attribue, and the predicate enrolled(Student,School,Units) could have several values for school for the same student. ID3's classi cation accuracy in Figure 8 is based on an experiment where the predicate enrolled was only allowed to take one value at a time.

As in [13], disjoint training and test sets with a controlled percentage of examples from each group were created from the total set of training examples. Training and test sets contained 50% cups, 25% drinking vessels and not cups, and 25% not drinking vessels and not cups. In the rst part of the experiment we used the domain theory for drinking vessel with examples that were classi ed as cups or not cups. The specialized description of cup that the program was supposed to learn is the same as in [13]:

4.3 EXPERIMENT 3: \CUPS" In a third experiment, we wanted to see how well gencould perform in two related learning tasks in the same domain. One task was to learn a speci c concept using an overly general domain theory. Antoher task was to learn a general concept using an overly speci c domain theory. We choose to use the arti cial cup domain described in [13]. Examples are described using eleven attributes, of which seven are binary, three have three values, and one attribute has four values. The entire instance space consisted of 13,824 examples generated by forming the cartesian product of the ranges for all the features. Out of the 13,824 examples, 252 were classi ed as drinking vessels, out of which 63 were also classi ed as cups.

cup(X) :- drinking_vessel(X), volume(X,small), shape(X,hemispherical). cup(X) :- drinking_vessel(X), volume(X,small), shape(X,cylindrical). cup(X) :- drinking_vessel(X), volume(X,small), shape(X,conical).

tre

gentre managed to learn a more accurate description than the original domain theory quite quickly. After which the performance did not improve very much, and after more training examples, gentre's performance was surpassed by both ID3 and IOU (Figure 103). IOU's and ID3's performance in this experiment is taken from [13]. The reason for gentre's inability to reach a higher

3 The results in Figure 10 and 11 are averaged over 30 runs

classi cation accuracy in this domain is due to gen's preference for a domain theory with as few and as general rules as possible (as long as they are consistent with training examples). Due to gentre's preference for domain theories with fewer and more general rules, it will bene t from domains where more negative training examples are available, and do more poorly in domains such as this one. In the second part of the experiment we used the same description for cups, to try to make gentre learn a description for drinking vessel. In other words, we used an overly speci c domain theory to try to learn a more general concept. In this domain gentre typically found concept descriptions consisting of one single rule (or at most two rules) that correctly classi ed all the training examples. On one occation gentre learned for example the following concept description (consisting of one single rule which classi ed all 32 training examples and all test examples correctly): tre

drinking_vessel(X) :has_bottom(X), flat_bottom(X), lightweight(X).

In this experiment we were not able to compare gen's performance with that of IOU, since IOU can not generalize an overly speci c theory. gentre's performance however was impressive giving 88% accuracy after encountering just 4 training examples, and an accuracy of 97.5% after 32 training examples.The results from this experiment is presented in Figure 11. tre

5 CONCLUSIONS gentre performs better than the original domain theory in all the tested domains using a relatively small number of training examples. For example, it needed only four examples to reach a level of accuracy equal to 87% in both cup domains. gentre retains this advantage in the lower numbers in all tested domains. gentre does better in domains where it has to learn a more general concept starting with an overly speci c domain theory, such as learning a concept description for drinking vessel from a domain theory for cups. This is quite natural because the generalization step in gentre tries all possibilities, and checks against all examples, while lgg in combination with the grouping mechanism can still be further developed. It is our intention to perform some experiments using gentre's grouping mechanism together with other inductive components, but at present this has not been investigated. The presented method for construction of partial explanations supports the use of partial explanations in

a wide range of situations. It does not require non recursive or variable free domain theories or put any other constraints on the types of domain theories to be applicable, although the learned domain theory cannot include any new predicates. We have also shown that partial explanations created in this way, can successfully serve as a focusing instrument in the search for possible corrections of an incorrect domain theory. One might object that gentre cannot really handle recursion although the original domain theory might contain recursive concept descriptions. This is because the rules that it learns are operationalized rules for the target concept, and unless some recursively de ned concepts are operational, gentre would never produce a domain theory that contains recursive descriptions of any concepts. This would be immaterial if in fact all the concepts to be learned could be expressed non-recursively using a bounded number of rule applications. If, however, the concept is fundamentally recursive (e.g. append), then the learned rules will only refer to lists of certain speci c lengths, (those seen in the training examples). The same argumentation goes for the claim that gentre is not restricted to function-free domain theories. One possible remedy for this, that we have currently not investigated further, could be to unoperationalize the learned rules by replacing some of the right hand side literals of the learned rule by their left hand side, taken from the domain theory, as described in [21]. Unlike FOIL and FOCL, gentre is currently unequipped with the ability to learn restrictions on numeric values. Consequently, their performances are not compared. Possible approaches to introduce this ability are by using predicates such as greater_than_five(X) as was adopted in earlier versions of FOCL. This however would be intractable when dealing with large numbers. Another way would be to extend Plotkins notion of least general generalization to include restrictions on numeric values. Such that for example the lgg of a(b,5) and a(c,7) would be (a(X,Y) ^ (Y >= 5) ^ (Y =< 7)) instead of only a(X,Y) as it is now, preliminary results show that such an extention would improve gentre's performance in some domains. The main di erence between gentre and other existing systems, designed to deal with both overly general and overly speci c domain theories, such as MLSMART, EITHER, FOCL and FORTE lies in how the new domain theory is created. The way gentre uses the partial explanations of the domain theory to group together unexplained positive examples before descriptions of these groups are created seems to allow for a very high accuracy of the new domain theory after relatively few examples. Another di erence lies in the use of least general generalization as a technique to create generalized descriptions of such groups.

Acknowledgements The work presented in this paper has bene tted greatly from discussions with a number of people. I want to thank Carl Gustaf Jansson, Pat Langley, Steve Minton, Ray Mooney and Ross Quinlan for invaluable suggestions and comments. Furthermore I would like to thank Malini Bhandaru, Henrik Bostrom, William Cohen, Jon Gratch, Peter Idestam-Almquist, Mike Pazzani, Christer Samuelsson, Alberto Segre and the anonymous referees for carefully reading and commenting on earlier drafts of this paper. This work has been partly supported by the Swedish National Board for Technical and Industrial Development (NUTEK).

References [1] Ali, K.M. (1989). \Augmenting Domain Theory for Explanation-Based Generalization", the Sixth International Workshop on Machine Learning, Cornell University, Ithaca, NY, pp 34{36. [2] Asker, L. (1991). \Using Partial Explanations: an Approach to Solving the Incomplete Theory Problem in EBL", the 3rd Scandinavian Conference on Arti cial Intelligence, Roskilde, Denmark, pp. 187{192. [3] Asker, L., B. Gamback and C. Samuelsson (1992). \EBL2 : An Approach to Automatic Lexical Acquisition", the 14th International Conference on Computational Linguistics, Nantes, France, pp. 1172{1176. [4] Asker, L. (1994). \Partial Explanations as a Basis for Learning", PhD Thesis, Department of Computer and Systems Sciences, Stockholm University. [5] Bergadano, F. and A. Giordana (1988). \A Knowledge Intensive Approach to Concept Induction", the Fifth International Conference on Machine Learning, University of Michigan, Ann Arbor, MI, pp 305{317. [6] Cohen W.W. (1990). \Learning Approximate Control Rules of High Utility", the Seventh International Conference on Machine Learning, Austin, TX, pp 268{276. [7] Cohen W.W. (1992). \Abductive ExplanationBased Learning: A Solution to the Multiple Inconsistent Explanation Problem", Machine Learning, 8:167{219. [8] Danyluk A.P. (1989). \Finding New Rules for Incomplete Theories: Explicit Biases for Induction with Contextual Information", the Sixth International Workshop on Machine Learning, Cornell University, Ithaca, NY, pp 34{36.

[9] DeJong, G. and R. Mooney (1986). \Explanation Based Learning: An Alternative View", Machine Learning, 1:145{176. [10] Flann, N. and T. Dietterich (1989). \A study of explanation-based methods for inductive learning", Machine Learning, 4:187{226. [11] Michalski, R.S. (1983). \A theory and methodology of inductive learning", In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (eds) Machine Learning: An Arti cial Intelligence Approach, Tioga, Palo Alto, CA. [12] Mitchell, T.M., R.M. Keller and S.T. KedarCabelli (1986). \Explanation-Based Generalization: A Unifying View", Machine Learning, 1:47{ 80. [13] Mooney R. (1993). \Induction Over the Unexplained:Using Overly-General Domain Theories to Aid Concept Learning" Machine Learning, 10:79{110. [14] Mooney R. and D. Ourston (1989). \Induction over the unexplained: Integrated Learning of Concepts with Both Explainable and Conventional Aspects", the 6th International Workshop on Machine Learning, pp. 5{7. [15] Ourston D. and R. Mooney. (1990). \Changing the Rules: A Comprehensive Approach to Theory Re nement", the 8th National Conference on Arti cial Intelligence, pp. 815{820. [16] Pazzani M., and D. Kibler (1992). \The Utility of Knowledge in Inductive Learning", Machine Learning, 9:57{94. [17] Pazzani, M., and C. Brunk (1991). \Detecting and correcting errors in rule-based expert systems: an integration of empirical and explanation-based learning.", Knowledge Acquisition, 3:157{173. [18] Plotkin G. D. (1971) Automatic Methods of Inductive Inference, PhD thesis, Edinburgh University. [19] Quinlan R. (1986). \Induction of Descision Trees" Machine Learning, 1:81{106. [20] Richards B., and R. Mooney (1991). \First Order Theory Revision", the 8th International Confernce on Machine Learning, pp. 447{451. [21] Tangkitvanich, S. and M. Shimura (1990). \Re ning a Relational Theory with Multiple Faults in the Concept and Subconcepts", In Proceedings of the Ninth International Conference on Machine Learning, Aberdeen, Scotland, pp. 436{444. [22] Towell, G. G., J. W. Shavlik, and M. O. Noordewier (1990). \Re nement of Approximately Correct Domain Theories by Knowledge-Based Neural Networks.", In Proceedings of the Eighth National Conference on Arti cial Intelligence, Boston, MA, pp. 861{866.

Suggest Documents