A Configurable Rete-OO Engine for Reasoning with ... - CiteSeerX

2 downloads 0 Views 2MB Size Report
Aug 5, 2010 - Davide Sottara, Paola Mello, and Mark Proctor. Abstract—The RETE ..... h : House( address == q.location ) then sendAlarmTo(h.owner) end.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 22,

NO. 11,

NOVEMBER 2010

1535

A Configurable Rete-OO Engine for Reasoning with Different Types of Imperfect Information Davide Sottara, Paola Mello, and Mark Proctor Abstract—The RETE algorithm is a very efficient option for the development of a rule-based system, but it supports only boolean, first order logic. Many real-world contexts, instead, require some degree of vagueness or uncertainty to be handled in a robust and efficient manner, imposing a trade-off between the number of rules and the cases that can be handled with sufficient accuracy. Thus, in the first part of the paper, an extension of RETE networks is proposed, capable of handling a more general inferential process, which actually includes several types of schemes for reasoning with imperfect information. In particular, the architecture depends on a number of configuration parameters which could be set by the user, individually or as a whole for the entire rule base. The second part, then, shows how an appropriate combination of parameters can be used to emulate some of the most common, specialized engines: 3-valued logic, classical certainty factors, fuzzy, many-valued logic and Bayesian networks. Index Terms—Inference engines, nonmonotonic reasoning and belief revision, rule-based processing, uncertainty, “fuzzy” and probabilistic reasoning.

Ç 1

INTRODUCTION AND RELATED WORKS

A

Rule-Based System (RBS) is a Knowledge-Based System encoding information in the form of rules. A rule is a syntactic construct stating that certain consequences C are connected to the acknowledgment of some premises P . Rule-based Systems have originally been used in the development of Expert Systems (ESs) [1], using languages that allow an expressiveness level comparable to that of propositional logic and sometimes even first order logic. The formalism often failed to capture the complex expertise required to handle efficiently complex domains such as the medical or financial one, so they have been surpassed by other technologies such as Bayesian Networks [2], or evolved into forms such as Fuzzy Systems [3]. Instead, RBSs have found wider application in contexts where absence of ambiguity is valued, such as the management of business processes. However, many studies have shown that hybrid systems integrating both Soft and Hard computing can achieve better performances [4], [5]. This paper then proposes an architecture suitable for strongly integrated hybrid systems, supporting rules evaluated in a customizable way, when the available information is not precise and certain, i.e., it is imperfect [6], [7]. Precise and certain rules usually can’t model all concepts with generality and conciseness at the same time: on the contrary, some degree of imprecision allows to formalize real-world criteria with a

. D. Sottara and P. Mello are with the DEIS, Universita` di Bologna, Viale Risorgimento, 2 40136 Bologna, Italy. E-mail: {davide.sottara2, paola.mello}@unibo.it. . M. Proctor is with JBoss, a division of Red Hat, 42 Sutton Lan South, W4 3JT - Chiswick, London, United Kingdom. E-mail: [email protected]. Manuscript received 15 Mar. 2009; revised 21 Oct. 2009; accepted 1 Jan. 2010; published online 5 Aug. 2010. Recommended for acceptance by N. Bassiliades, G. Governatori, A. Paschke, and J. Dix. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TKDESI-2009-03-0166. Digital Object Identifier no. 10.1109/TKDE.2010.125. 1041-4347/10/$26.00 ß 2010 IEEE

limited number of rules. Imperfection is a general term used to describe rules softened in different ways: from graduality, where a known property maybe present at different levels, to uncertainty, where some information is missing either because of a lack of knowledge or due to a intrinsic aleatority in the context being analyzed. Consider, for example, the simple abstract rule “if Danger then Alarm.” It has a literal interpretation—sound the alarm in case of danger—but it can also be given a probabilistic meaning: given the probability of Danger, there will be a certain probability of hearing an Alarm (and upon hearing an Alarm, one could estimate the probability of Danger using Bayes’ rule). The same rule could also have a fuzzy, gradual meaning: the greater the Danger, the louder should be Alarm, so that nuisances could be distinguished from serious accidents. Moreover, the information concerning the Danger may come from various sensor data, with different degrees of reliability, which should be taken into account. For example, the relative frequency and cost of false positives and false negatives, as well as the possibility of exceptions, may influence the decision of generating an alarm in presence of a danger (and vice versa). Since rules define relations between entities, while imperfection influences the specific meaning of these relations, the two aspects can be considered separately. The goal of this paper is to propose a configurable architecture in which a given rule base, a coherent set of rules, can be extended and evaluated using different types of imperfect information, possibly with an individual rule granularity. The need of such systems has recently been advocated by Zadeh [8]: from a theoretical point of view, Dama´sio et al. [9] have shown that it is possible to use a unified language for the representation of different types of imperfect rules: however, as far as we know, very few similar works have been carried out for rule evaluation (e.g., [10]). The task of deciding which rules are satisfied by the available knowledge (facts), ordering them by priority and evaluating them is performed by rule engines [11]. Many engines do exist, but the rules they process usually depend on the languages supported and their expressive power (e.g., Published by the IEEE Computer Society

1536

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

propositional logic, first order logic, probabilistic logic, . . . ), the available rules of inference (modus ponens, modus tollens, . . . ), the propagation strategy (forward, backward, or hybrid chaining), the conflict resolution criteria (FIFO, LIFO, priority, . . . ) and the evaluation results (pure logical consequences or side effects). The most common choice in developing large-scale engines is possibly the RETE algorithm [12]: it can be properly used to implement production rule systems, a specific type of reactive rule-based systems using modus ponens as inference rule and forward chaining propagation. The success of RETE depends on its scalability, even if successive studies have shown that it is possible to further improve its performances [13]. RETE looks for matches between facts and rules by generating all possible combinations of the available data, albeit efficiently: the algorithms LEAPS [14] and TREAT [15] have tried to deal with this limitation by adopting different strategies such as lazy evaluation and conflict set support. Nowadays, most real-world RBSs are RETE-based, but often implement optimizations (many of which are closed-source), and add higher level functionalities. Among them, the mainstream open source rule engine DROOLS [16] has recently grown in functionalities and popularity: it uses an object-oriented version of RETE, supports first order logic, implements a priority conflict resolution strategy, allows side effects as consequences and is bundled with external modules for event and workflow processing. Like many other engines, however, it does not support imperfect logic: while much work has been made to optimize the processing of rules, the semantics of the evaluations using RETE has not been investigated extensively. In a recent and more detailed survey [17], it is shown that some mainstream engines support fuzzy logic, but support for other types of rules is not so common, even if important results in the fuzzification of RETE networks have led to systems such as FuzzyCLIPS [18] and FuzzyShell [19]. The goal of this work, hence, is to introduce generic imperfect evaluations in a RETE network: to do so, we started from a realistic case, DROOLS, extending its language, in accordance with [9] and in prevision of a future compatibility with fuzzy RuleML, using metadata to define the actual semantics of the imperfection associated to the rules. This language is compiled into an adaptive, extended RETE: after briefly recalling the basic RETE-OO architecture in Section 2, our proposal is properly discussed in Section 3. Eventually, Section 4 shows how some noteworthy behaviors can be achieved by choosing the network configuration parameters appropriately.

2

RETE NETWORKS

A rule base is a collection of logically consistent rules stored in a long-term memory. At runtime, the inference engine matches them against the current set of facts, stored in a short-term, working memory (hence, facts are also called Working Memory Elements—WMEs). While engines can be interpreters, the RETE algorithm uses a compilative approach to speed up computation in case of large rule bases. The rules generate a network, a directed acyclic graph, whose nodes correspond to different steps in their evaluation, and shares the intermediate results whenever possible to avoid unnecessary, repeated computations. RETE was developed specifically for production rules: such rules are typically written in some “if P then C”-like

VOL. 22,

NO. 11,

NOVEMBER 2010

form, even if in this paper, we will also (improperly) use the shorter logic notation “P ) C”. A production rule engine matches the available facts against the rules’ premises: whenever the preconditions P are satisfied, a rule becomes active. A conflict resolution strategy is applied to order the set of active rules, which are then fired in sequence, executing the actions C. The consequences in the then part of a rule may alter the Working Memory adding (insert), removing (retract) or modifying (update) facts, possibly triggering the activation of cascaded rules. In the objectoriented version, facts are objects, instances of classes with named and typed fields. Given a rule base R, instead, a i for objects to be rule ri is a sequence of m patterns Cj:1::m matched against. Each pattern is composed by a variable sequence of constraints i;j k:0::nðjÞ : the first constraint of any pattern 0 is always an extensional [8] class constraint, stating that a candidate object must be an instance of the specified class. The other constraints, instead, are intensional and are applied to the values of the fields. A constraint has the general form: Constraint ::= Field Op [Expr] Op ::= Evaluator Attrs? Attrs ::= ’@’ ’[’ Attribute+ ’]’ Attribute ::= Key ’=’ Value Key ::= ’kind’ j ’degree’ j ’params’ j ’id’ Expr ::= Literal j Field j Variable j ... A constraint uses a unary or a binary evaluator (e.g., ¼ , 6 ,  , . . . ) to test the value of an object’s field. The actual ¼ implementation can be controlled using attributes, like in [9]. For binary evaluations, the second value is typically a constant or the value of a field, possibly from another object. Each constraint can be identified and referenced using a signature, also denoted by i;j k , which is a function of its syntactical components.1 i;j k ¼ #Field^#Op^#Expr: Each constraint is mapped on a different node in the network: in particular, two types of nodes exist: -Nodes, evaluating constraints on individual facts, similarly to a SELECT operation on a relational database table storing all instances of a certain class. . -Nodes, evaluating constraints on sets of facts, similarly to a JOIN operation between tables. The -network is composed by -nodes and -memories. -nodes are chained sequentially, one for each constraint in the order given by the patterns. If two patterns Cj0 and Cj00 share the same initial sequence of n constraints 0::n1 , the corresponding part of the network is shared and connected 0 00 to the nodes for jn and jn . The -network is a tree whose leaf are -memories, storing the references to the objects matching a pattern. Similarly, the -network is composed by -nodes and -memories. A -node is created for each consecutive pair of patterns Cj and Cjþ1 . A -node evaluates whether an object of class C0jþ1 is compatible with an existing tuple (an ordered list of objects of objects) of types C01::j , according to .

1. # and ^ maybe hashCode and xor, or toString and concat, respectively.

SOTTARA ET AL.: A CONFIGURABLE RETE-OO ENGINE FOR REASONING WITH DIFFERENT TYPES OF IMPERFECT INFORMATION

1537

rule written in the form P ! C, where P is a formula composed by patterns defining the LHS, ! models the connection between the premise and the conclusion—usually (but not necessarily) a logical implication—and C the actions executed by the engine as a consequence of the activation of the rule, possibly including the generation of new facts. Using the symbol " to denote the truth degree of the various parts, which can be either true (1) or false (0), the Modus Ponens rule can be written: hP ðxÞ; "ðxÞi; 8X : hP ðXÞ ! CðYðXÞÞ; "ð!Þi : hCðyðxÞÞ; ") ðx; "ðxÞ; "ð!ÞÞi

Fig. 1. RETE for Example 1.

the constraints defined in the pattern Cjþ1 . While objects are stored in -memories, the tuples are saved in -memories, one for each -node. The -node has two inputs and one or more outputs, so the network is a poly-tree: again, nodes are shared when rules share sequences of patterns. The right source is connected to the output of an -memory, and the left one to the output of a -memory. The node joins objects ojþ1 from its right input with tuples tj from its left input and, if the augmented tuple tjþ1 ¼ tj [ ojþ1 satisfies the constraint, outputs tjþ1 in the -memory shared with the -node for pattern Cjþ2 . Notice that the first -node has no constraint and simply converts objects into tuples with a single member, while tuples coming out of the last -node meet the preconditions of a rule: they are not stored in a -memory, but passed to a terminal node which creates an activation record for the rule and schedules its activation according to the conflict resolution strategy adopted by the engine. As an example, the rule, inspired by J. Pearl’s alarm problem [2], yields the network in Fig. 1. Listing 1. “Example 1” when q : Quake( magnitudo > 5 ) h : House( address == q.location ) then sendAlarmTo(h.owner) end

3

EXTENDED RETE-OO

The RETE-OO algorithm introduced in Section 2 is designed and optimized for boolean logic, where the evaluation of rules can be simplified with respect to the most general case: for example, operators other than conjunction become irrelevant and a fact can be considered true as soon as a rule entails it in its consequences. To increase its flexibility, the network must be generalized, even if at the expense of some optimizations. This section will outline a more general inference schema and propose an extended, parametric RETE network capable of supporting it. In particular, both issues related to topology and propagation will be discussed. Then, it will be shown how the proposed architecture can handle imperfection naturally. Finally, the various degrees of freedom will be grouped into a configuration schema, filled with different combinations in Section 4.

3.1 Abstract Reasoning Schema The main inference rule of a forward chaining algorithm is Modus Ponens ð)Þ and generalizations thereof. Consider a

ð1Þ

For the rule to be activated, a tuple x of objects must satisfy the constraints in the premise patterns P ðXÞ, i.e., "ðxÞ must be true. If the implication is also true, which is normally assumed to be the case, the output arguments y can be computed applying some function YðXÞ to the premise arguments x. If opportunely mapped onto the RETE nodes, the constraints in the premise and the implication allow to evaluate the conclusion and its truth in both the boolean and the imperfect case.

3.1.1 Constraint Evaluation The constraints  are usually equality or ordinal comparisons for different data types. DROOLS also offers more advanced evaluators (e.g., contains and soundsLike) but most importantly it allows to define and plug-in custom evaluators. A custom evaluator implements a test-score function [8]  : X 7! L ¼ f0; 1g specified with an appropriate adapter interface, so it can be any module capable of evaluating a property of its arguments, from simple functions to Neural Networks or Bayesian Networks. Evaluators are unary or binary, but may configured passing additional parameters. Like standard evaluators, they must return a truth degree "ðxÞ as a result of evaluating object x. 3.1.2 Source Merging The relation between a generic set of objects X and a property constraint  is a Predicate  ¼ . The evaluation of its truthfulness, applied to the argument instance x, will be denoted2 by " ðxÞ. Direct computation at the node is not the only possible way of evaluating it, since different sources may contribute: "0 : Prior information, available before the computation and provided as a fact. 2. " : Direct evaluation, resulting from the evaluator embedded in the node. 3. "i : Logical entailment by one or more rules ri2I , with I indexing the set of rules providing information on the constraint as one of their consequences. When multiple sources are available, a merge operator \ : L2þjIj 7!L is needed, so: \ " ¼ "i : ð2Þ 1.

i2f0;g[I

No property other than closure is strictly required on \, even if commutativity and associativity can be useful. In a consistent boolean rule base, the definition of \ is normally 2. When clear from the context, the arguments will be omitted.

1538

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 22,

NO. 11,

NOVEMBER 2010

redundant, and thus omitted, because all sources supporting a predicate just return true.

3.1.3 Aggregation Individual constraints are aggregated into more complex formulas using logical connectives, such as conjunction (“and”), disjunction (“or”), or negation (“not”). Connectives are implemented using operators, which can be considered a special type of predicates, Opð1 ðX1 Þ; . . . ; n ðXn ÞÞ, imposing a logical constraint on a set of arguments, in turn, predicates themselves. The evaluation is function of both the evaluations and the arguments of the predicates being aggregated:   1 n ð3Þ "Op  ¼ Op X1 ; . . . ; Xn ; "1 ; . . . ; "n : An operator is truth-functional if the evaluation depends 1 n only on the truth degrees: "Op  ¼ Op ð"1 ; . . . ; "n Þ. Truthfunctionality is a desirable property as it increases the computational efficiency, especially, in the evaluation of nested operators forming complex expressions. A different type of aggregation maybe performed using logic quantifiers, such as 8X and 9X, which target all objects for which a predicate ðXÞ has been evaluated. Like operators, quantifiers maybe truth-functional.

3.1.4 Rules and Formulas Although a detailed description of the rule language is beyond the scope of this work, a simplified abstract grammar is provided to outline a few noteworthy properties: Premise ::= Formula Formula ::= Pattern j Operator j QuantFormula Operator ::= Connective Attrs? ’(’ Operands ’)’ QuantFormula ::= Quantifier Attrs? : Formula Operands ::= Formula j Formula, Operands The premise of a rule is a logical formula, where connectives and quantifiers maybe used to combine the individual patterns. This grammar allows an arbitrary level of nesting, which is usually not present when the conjunction is the only connective and the only quantifier is the universal one. Hence, the structure of a formula is not linear, but can be parsed into an Abstract Syntax Tree3 with patterns for leaves. Pattern ::= ClassConstr ( PatArgs ) Attrs? PatArgs ::= Constr j Constr, PatArgs Constr ::= AlphaConstr j BetaConstr j OpConstr OpConstr ::= Connective Attrs? ( PatArgs ) Likewise, the intrapattern - and -constraints can be composed arbitrarily using logical connectives, leading to syntactical subtrees where patterns are roots and constraints are leaves: hence, an entire formula can be modeled using a tree. At all levels, optional attributes can be used to configure the type of operators. Using operators, Example 1 can be expanded in a more articulate form, which can be parsed into the AST shown in Fig. 2. Type checks are anded with their pattern constraints. Listing 2. “Example 2” when and ( h : House( a : address ) 3. The actual grammar is not regular, but LR(k) context-free.

Fig. 2. AST for Example 2.

or ( q : Quake( magnitude > 5 && location near h.address) b : Burglar(victims contain h))) then sendAlarmTo(h.owner) end

3.1.5 Projection According to inference rule 1, once the premise has been computed for an argument tuple x, it is joined with the universal implication !, which is normally assumed to be valid (i.e., true) for all X matching P . This information is implicitly provided as a fact when the rule is written.4 Connective ::= And j Imply j ... Rule ::= Premise Imply Conclusion Conclusion ::= Formula j ... When premise and implication match, a rule becomes active and can eventually enact its conclusions. The conclusions can be of two types: Logical entailments: A rule may generate new objects, or contribute to the truth of a predicate . . Side effects: Using adequate interfaces, the RHS of a rule may affect any other object, internal or external. Along with custom evaluators, this option is essential in developing hybrid systems. In either case, the truth degrees and the arguments x are available to the conclusion and can be used to compute the output data yðxÞ and its associated truth degree. Notice that the entailment of formulas, common in logic programming, is not normally available in a RETE-OO-based system, and thus requires a dedicated extension (discussed in Section 3.2.4). This allows not only to estimate the truth of a fact or a constraint, but also of combinations thereof, by addressing operators directly. Given the aggregate truth degree of an operator, it is not usually possible to infer the truth of the individual operands: the only notable exception is the formula :C: since the operand C can be obtained by negation. However, an operator itself can be considered a complex atom in a formula: this case becomes even more relevant when applied to an implication operators. The truth degree of a rule is usually provided de facto stating "! 0 ¼ true but, .

4. In DROOLS, the rule preamble models !.

SOTTARA ET AL.: A CONFIGURABLE RETE-OO ENGINE FOR REASONING WITH DIFFERENT TYPES OF IMPERFECT INFORMATION

1539

according to (1), "! is the degree of an universally quantified implication operator. So, it is feasible to compute the contribution "!  from a set of training examples, or even to use other rules ri to set "! i , as in the example A ! ðP ! CÞ. Using this feature, a rule base can become dynamic as rules are activated or deactivated according to the evaluation of other rules: for example, it can be exploited to model exceptions [20].

3.2 Extended RETE Networks A standard RETE network does not support the reasoning process proposed in Section 3.1. Nested operators and quantifiers are not allowed as all constraints are supposed to be connected by “and”: in case, an “or” is used between patterns, the rule is split into two rules. It is possible, however, to introduce operators explicitly using additional nodes. For network construction purposes, quantifiers behave like unary operators with a variable number of operands; their efficient evaluation, instead, requires a specific implementation which goes beyond the scope of this paper. Even if some of the concepts have already been introduced in a previous work [21], this proposal addresses some of the problems left unresolved by the previous one and effectively overrides it. 3.2.1 Operator Nodes It has already been shown that the grammar defining patterns and formulas allows to map rules onto (syntactic) trees, whose nodes model either operators/quantifiers or constraints. Like in the original RETE, and unless otherwise specified, the patterns are and-connected and the class constraint is and-connected to the other constraints in the same pattern. In particular, for each rule, there is one such and node for each pattern: the set of these nodes defines a cut of the tree separating the intrapattern aggregations from the interpattern ones, so they will be called “pattern nodes.” The nodes n can be labeled with an index k according to the order they are visited during a postorder traversal of the tree. In this type of visit, all the children of a node are recursively visited, from left to right, before the node itself is visited. This ensures that if nðk0 Þ depends on nðk00 Þ, either because the latter is a descendant of the former or because nðk00 Þ is a pattern node and nðk0 Þ holds a join constraint with its pattern, then k0 > k00 . An example of node numeration is given in Fig. 2. 3.2.2 Network Construction Given the ordering, it is possible to deploy the augmented RETE. Each rule can be analyzed individually: given the sequence C½i ¼ kj nðkÞ is the ith pattern node, and defined k ¼ maxfCg the index of the last pattern node, the nodes with k  k will become part of the -network (unless they are join constraint nodes), while the others will be included in the -network. Let also F ½i ¼ k such that nðkÞ is the first child of nðC½iÞ (if nðC½iÞ has no children, take F ½i ¼ C½i). The nodes in the range F ½i . . . C½i are connected sequentially, starting from nðF ½iÞ (a class-constraint node). The last, nðC½iÞ (a pattern node), is connected to an -memory, i . However, if any of the nodes holds a join constraint, it is skipped. The memory i is then connected to the right input of a join node i . The output of i is connected to all free nodes with index less than C½i, to verify whether the joined candidate tuples are actually valid or not. Then, all free nodes with index greater than C½i, but less than F ½i þ 1, are connected, since they are typically operator nodes

Fig. 3. RETE for Example 2.

whose operands are all valid and so can be evaluated. The last node is then connected to a -memory, i , where the tuple computed so far can be stored. The left input of i , instead, is connected to the output of the memory i1 , or to a dummy input in case of the first node. Notice that the last join node’s right memory is connected to an implication node and its output is a sequence of a join constraint (optionally) followed by a -memory and a Modus Ponens operator node acting as terminal node. Algorithm 1 formalizes the procedure. Fig. 3 shows the result when applied to the rule in Example 2. Algorithm 1. RETE Network Construction Require: Nodes {Node map} Require: C {Ids of Pattern Nodes} Require: F {Ids of Class Nodes} Require: join, alfa, beta {Generic Nodes} N length(C) curNode RETE.entryPoint for i ¼ 1 to N do patternNode C½i classNode F ½i for all j between F ½i and C½i do if ! Nodes[j].isJoinConstraint then attach(curNode,Nodes[j]) {curNode is shifted} end if end for attach(curNode,alfa(i)) join(i).setLeft(beta(i  1)) attach(curNode,join(i)) for all j < F ½i þ 1 do if ! Nodes[j].isAttached then attach(curNode,Nodes[j]) end if end for attach(curNode,beta(i)) end for

3.2.3 -Memories When join constraints are used in patterns, they can’t be evaluated until after a candidate combined tuple has been created: so, the constraints may not be evaluated in the order they are written in the rules. Since and is associative and commutative, the problem does not actually arise when

1540

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

it is the only operator used in logical formulas, like in the standard RETE configurations. The use of arbitrary operators, together with the multiple evaluation sources, requires a further extension. A predicate  defines a constraint  applied to a domain X whose elements are object tuples x (an object is equivalent to a single-object tuple, so the definition can be applied to all types of constraints): the evaluation of the constraint " ðxÞ is stored in a dedicated structure, called Eval and denoted by ð; xÞ. The Evals, in turn, are stored in -memories, attached to constraint nodes  and indexed using tuples x as key. An Eval stores, manages, and combines the different evaluations of a constraint  for a tuple of objects x. In order to build an Eval, it is necessary to know the set of rules I including the entailment of  among their consequences. Its fields are: an array of jIj þ 2 truth degrees  ¼ ½"0 ; " ; "j:1::jIj  2 ðL [ ÞjIjþ2 , . a merge operator \ : ðL [ ÞjIjþ2 7! L, . an optional operator evaluator ?, . an array of boolean flags kj:0::jIjþ1 , . a strategy S , . a strategy Sk , . a reference to the tuple x, and . the signature (id) of the associated node. The array ½1::jIj þ 1 holds the different partial contributions, or the special symbol  if the corresponding piece of information has not been obtained, for example, because a rule has not fired. In particular, ½0 is reserved for prior information and ½1 for direct evaluation. The elements ½i are combined using the operator \, which requires two additional pieces of information. The first is a strategy S to handle the missing contributions: the strategy is configurable and depends on the definition of \, but typical examples are: .

Ignore: Missing values are not considered. Closed World Assumption: Missing values are set to false and then merged. . Open World Assumption: Missing values are set to unknown (if 2 L) and then merged. The flags, instead, are used to have \ take the flagged values in greater account when merging them, according to another pluggable strategy Sk . A trivial strategy would ignore all nonflagged contributions, unless none were flagged. A subclass of Composite [22] Evals is defined for operator constraints: they additionally store the references to a number of Evals equal to the operator arity. These values, conditioned by S , are aggregated using the operator evaluator ? to yield "Op  . Evals use both the Observers and Observables design patterns [22]. They are notified when one of the slots ½i changes its value, and notify when the output of \ changes. The notified information includes the new value of " and the ratio jj ¼ jj=ðjIj þ 2Þ, i.e., the ratio of available contributions over the total number of possible ones. . .

3.2.4 Propagation The Evals are created at  and -nodes and connected to form the evaluation tree of a rule. Before discussing the propagation of information within the network, we remark that while some production systems propagate predicates

VOL. 22,

NO. 11,

NOVEMBER 2010

(e.g., the Prolog-based one) and RETE-OO propagates objects, in our architecture both types of structures are present. In particular, the following actions can be used in the logical consequences of a rule: insert new objects, retract existing objects, update existing objects, inject truth degrees "i in predicates, setting or updating the value, and . reject setting a predicate’s value to . When an inserted object or a tuple is propagated through the network, it passes through the  and -networks: the evaluators in each constraint node provide a first truth degree, but the contribution of priors and injecting rules may affect the final result, possibly even after the first passage of the object. In fact, the activation order of rules can’t be predicted, especially if they depend on facts coming from external sources. In some cases, rules could be the only source of information for a constraint, so there must exist some form of synchronization between inserts and injections [23]. Tuples can be stopped temporarily at constraint nodes: the latter are extended with a configurable strategy Sf , which sets the conditions according to which one of the actions applies: . . . .

Pass: The tuple is forwarded with the result of the constraint evaluation. . Hold: The tuple is held within the node, until the evaluation of the constraint returns a different value. . Drop: The tuple is discarded. When a constraint can’t be evaluated (e.g., because the field is set to null) and Sf allows it, the  node holds the object until another rule injects information on that constraint for the same object. As objects traverse the network, the Evals are incrementally added in a stack-like structure:  and  nodes push their Eval on the top, while operator nodes create a tree structure by popping a number of Evals equal to their arity before pushing their aggregate Eval. The Eval on top of the stack is assumed to hold the current degree, since it is either the last individual evaluation or the root of the evaluation tree. The underlying push model, together with the strategy S , allows to compute the truth degree of a rule incrementally and can, at the same time, be exploited by the synchronization mechanism. The constraint nodes, in fact, observe the main Evals and reevaluate the policy Sf when the degree changes. Notice that an injection may also happen after the tuple has been Passed through a constraint node, possibly even after it has caused a rule to be activated—i.e., the tuple is waiting to be fired in a terminal node. The injected value is still delivered to the corresponding Eval and, if it causes the local degree to change, is propagated through the Eval tree up to terminal nodes which, like constraint nodes, are Observers configured using Sf . The behavior is outlined in Procedures 2, 3, and 4. .

Procedure 2. Node.onInsert(Tuple t) Require: filter {Sf strategy}  mem:getðtÞ if  ¼  then  createEval(t,this) .set(" ,this.eval(t))

SOTTARA ET AL.: A CONFIGURABLE RETE-OO ENGINE FOR REASONING WITH DIFFERENT TYPES OF IMPERFECT INFORMATION

end if t.evalTree.add() ½"; jj t.evalTree.eval() {highest ranking degree} if filter.decide("; jj) ¼ ’PASS’ then this.remove(t) propagate(t) else if filter.decide("; jj) ¼ ’HOLD’ then this.store(t) .attach(this) else if filter.decide("; jj) ¼ ’DROP’ then .destroy() t.destroy() end if Procedure 3. Gamma.onInject(Node n, Tuple t, Rule r, Degree ")  n.mem:getðtÞ if  ¼  then  createEval(t,n) end if .set(r,") .notify() {notifies n on cascade} Procedure 4. Node.onNotify(Tuple t, Degree ", Degree jj) if this.holds(t) then if filter.decide("; jj) ¼ . . . ) then . . . {see Procedure 2} end if end if Evals have other advantages beyond synchronization: 1.

2. 3.

4.

The truth maintenance of objects and predicates is kept separate, so there is no need for cross-checks as in [21]. The full evaluation tree and the overall degree is accessible in the consequences of a rule. Truth maintenance can be supported correctly in presence of retractions and/or rejections. Since \ is generally neither associative nor invertible, the aggregate result alone may not be sufficient. Maybe used for argumentations, as the different contributions are clearly identified.

3.3 Imperfect Information So far, all constraints have been supposed to be boolean, i.e., either true or false. Many real-world contexts, however, are difficult to model using only strictly boolean rules. The consistency and the robustness of the rules maybe improved, while usually lowering their number, by allowing the rules to deal with some degree of imperfection. The term imperfection actually may describe different concepts, namely uncertainty, imprecision, and inconsistency [6], [17]. Uncertainty arises whenever an actor lacks complete information about the outcome of an event, for example, because it will take place under unobservable circumstances, or in the future. This type of imperfection is usually measured using probability, which, in turn, maybe subjective (epistemic) or objective (statistical). The latter models the ratio of favorable cases (or “worlds”) over all possible ones and can be estimated using a frequentist

1541

approach by repeated trials; the former is often defined as the amount one would bet in a game in which victory is tied to the outcome of an event. In both cases, uncertainty is due to a lack of information that can be acquired after the event has actually taken place. Probability is different from vagueness, which appears when information is known but expressed with some degree of imprecision: it maybe approximate (e.g., the value of x is more or less . . . ), or aggregate (e.g., all x whose value, rounded up, is . . . ). Vagueness is usually represented using fuzzy logic [3] which, despite much theoretical debate [24], has found many practical applications [25]. A third concept is confidence: it measures the strength of a statement, either as an absolute value or in comparison to others. It distinguishes solid facts from those lacking enough support and can be used to estimate the reliability of a logical conclusion, or to rule out conflicts and inconsistencies. Much work exists on the use of all these types of imperfection in logic and a brief survey can be found in [26]. Confidence has been the first type of imperfection implemented in a rule-based expert systems, MYCIN [1]; after that, probabilistic logic has been encoded using Bayesian networks [2], while vagueness has been implemented in fuzzy logic systems [3]. Even if both have a theoretical resemblance to rule-based systems, the existing implementations are actually highly specialized and efficient engines which exploit some reasonable assumptions: for example, the logic behind a Bayesian network is propositional, while fuzzy systems usually have many rules in parallel, but rarely chained in series. So, one engine can hardly be used to emulate the other, while determining which type of uncertainty is more appropriate to model a domain is a very complex and problem-specific task, and a single type may not be enough to capture its complexity. As an example, [27] shows that the individual use of different uncertain approaches in a medical context has led to benefits as well as drawbacks. However, the architecture, so far proposed for boolean logic, can accommodate different types of imperfect reasonings, possibly mixing them and even allowing interaction with external, more standard uncertainty handling systems, simply by generalizing the concepts of truth degree and aggregation operators. While the rest of this section discusses these topics in general, Section 4 will define the configuration sets to emulate the behavior of different uncertain engines.

3.3.1 Degrees A “degree” " models the degree at which a predicate can be considered true or false. Degrees may include truth degrees (fuzzy states of truth), belief degrees (the probability that a property is true in a boolean sense), confidence degrees, or even more complex structures. The set of degrees, denoted by L, is a generalization of the boolean case, where L ¼ f0; 1g. The formal requirements vary, but L is typically a partially ordered set, with an inf L ¼ 0 and a sup L ¼ 1 for false and true, respectively. Optionally, there maybe a third extreme value, ?, to model complete ignorance. Usually, L ¼ ½0; 1: a real value is adequate to model any one type among probability, fuzziness, and confidence, provided that the degree itself can be assessed with

1542

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

VOL. 22,

NO. 11,

NOVEMBER 2010

precision since ? is not supported. However, higher order degrees have been proposed and used in many applications: intervals (L ¼ ½0; 1  ½0; 1, see [28]), fuzzy numbers (L ¼ 2½0;1 , see [3]) or even imprecise belief structures (L ¼ ð2½0;1 Þm , see [29]) on sets of truth degrees. An implementation has to return the best approximation in terms of simple, real-valued degree (asReal()) and of crisp boolean (asBoolean()), as declared in a general interface Degree. Other casts can be defined to combine different implementations.

actual degree, the confidence measure (if present), and the number of contributing sources jj. A few examples of policies could be:

3.3.2 Generalized Operators, Negation, and Quantifiers Using generalized truth degrees, the role of connectives and quantifiers is nontrivial. In boolean networks, and is the only operator required to control the join of pattern-compliant objects into tuples, while the quantifiers exists, forall, and not exists condition the propagation of a tuple on the state of the working memory, depending on whether at least one, all, or no objects matching a certain formula are present. Assuming imperfection, the presence (resp. its absence) of an object can’t be associated to a true (resp. false) statement. Instead, connectives and quantifiers allow to combine the additional information carried by the degrees in different ways. The former typically include conjunction ^, disjunction _, and exclusive or  , but also implication ! and equivalence and negation :. The latter are the usual 9, 8, and 6 9. Notice that ! and are essential in defining rules (see (1)) and that there is a difference between existential 6 9 and logical : negations. The only mandatory properties of operators implementing connectives are coherence and closure. Being a generalization of boolean logic, where operators are defined mutually, the operators used in a rule should belong to a common family. Moreover, the operators should model functions X  L 7! L, returning a degree compatible with the ones of the operands. Using imperfect degrees and operators, Example 2 can be further extended into Example 3. The evaluators return Degrees and their definitions can be customized using attributes in square brackets: for example, “strong[5]”evaluates the concept of strength for the magnitudo of a quake using a reference set point. Notice that a prior implication degree "! 0 is used in the rule itself. The implementation of the degrees, instead, is chosen externally, in the rule base configuration.

3.4 Complexity To estimate the complexity of the algorithm, we consider:

Listing 3. “Example 3” rule “Alarm” degree = “. . . ” when h : House( ) and @[kind=“. . . ”] ( q : Quake( mag strong @[args=“5”] && loc near h.address) or b : Burglar(victims contain h)))) then sendAlarmTo(h.owner) end

3.3.3 Filtering In the original RETE network, true tuples are propagated and false ones are discarded: this basic strategy can still be guaranteed using the asBoolean() cast, but would exploit only a fraction of the available information. Applying a pluggable strategy Sf to the root Eval allows to obtain more complex behaviors, taking into account factors such as the

. . .

Full synchronization: propagate on jj ¼ 1, hold otherwise. Closed World Assumption: propagate on 5 1, discard on 0, hold on ? Open World Assumption: propagate on 1 or 0, discard on ?

. The number of rules R, . The maximum length of a sequence of -nodes A, . The maximum length of a sequence -nodes B, . The maximum number of patterns in a rule C, . The number of objects in the working memory W . The RETE algorithm has a worst-case complexity W C that grows exponentially with the number of patterns in a rule, but the actual cost is tractable and several optimization techniques improve its performance [13]. In this paper, optimizations are not considered, but we analyze the cost of the additions: the custom evaluators in the constraint nodes, the new operator nodes, and the degree updates due to injections. The complexity of an embedded evaluator can’t be controlled, so it will be assumed to be constant. The presence of operators lengthens the  and  chains by a constant-bound factor. In fact, if a is minimum arity of the Plog L operators and L ¼ maxfA; Bg, a tree with up to j¼1a aj ¼ OðLÞ nodes can be built from L nodes, yielding a sequence of length proportional to a L. Hence, the cost of propagation is comparable to the standard case, even if C has to be increased by one due to the implication node. Cþ1 X fðW !ðAÞÞj1 ðW !ðAÞÞ þ !ðBÞg:

ð4Þ

j¼1

A relevant cost increase, both spatial and temporal, is due to the injections. The merge at each constraint node may have to process up to OðRÞ degrees, affecting the propagation cost !ðAÞ (resp. !ðBÞ) that becomes OðA RÞ. In case of late injections, an Eval tree could be updated up to OðRÞ times, each time with a cost OðR logðxÞÞ making the cost quadratic in the number of rules. Likewise, the storage of rule-entailed degrees requires a spatial cost OðR L W Þ. Thus, the critical point is the maximum number of injecting rules for any constraint: even if all rules could theoretically inject all others, in most practical cases, the effective number of injecting rules R0 will be much lesser than R.

4

RETE CONFIGURATIONS

This section shows how to implement different reasoning schemas, combining the parameters in Table 1. The different configurations, described in detail in Sections 4.14.5, are summarized in Tables 2, 3, 4, 5, 6, and 8. 5. "  , j"  j  1.

SOTTARA ET AL.: A CONFIGURABLE RETE-OO ENGINE FOR REASONING WITH DIFFERENT TYPES OF IMPERFECT INFORMATION

TABLE 1 Rete Options

4.1

1543

TABLE 3 Configuration for 3-Valued Boolean

Boolean Logic

4.1.1 Two-Valued Boolean Logic RETE was designed for boolean logic: properties can only be true or false, and the latter cause tuples to be discarded. A fact not present in the WM is supposed to be false. The boolean conjunction ^ is the only operator used: alternatives (_), when present, are split into different rules. The rules are implicitly true implications and fire in presence of true premises. Even if more than one rule can support a given consequence, no inconsistencies arise since only true properties are asserted. The parameters in Table 2 give such behavior. 4.1.2 Three-Valued Logic Sometimes it is not possible to decide whether a property is true or false, but a third value, unknown, can be used to model this condition of ignorance. This choice is usually accompanied by the adoption of the Open World Assumption, for which unstated facts are unknown. Thus, false tuples and properties can be propagated or entailed: the TABLE 2 Configuration for Boolean Logic

negation operator : becomes meaningful, even if at the risk of inconsistencies. However, to avoid propagating meaningless information, tuples for which class or join constraints are false are still dropped. When constraints evaluate to unknown, the tuples are held until all possible contributing rules have fired, and dropped if eventually no information has been gained. Notice that the third value is not used to model partial truth. The degrees themselves can be conveniently modeled using two boolean indicators, ðN; Þ for necessity and plausibility [30], having ð1; 1Þ; ð0; 1Þ; ð0; 0Þ encode true, unknown, and false, respectively.

4.2 Reasoning with MYCIN Certainty Factors Certainty factors were introduced in MYCIN [1], one of the first expert systems ever developed, to deal with the statistical uncertainty typical of the medical context in which it was developed. They replaced proper joint and conditional probability distributions, difficult to elicit with TABLE 4 Configuration for Certainty Factors

1544

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

TABLE 5 Configuration for Fuzzy Logic

absolute numerical precision from human experts: instead, they quantified the “belief” (MB) or “disbelief” (MD) in the truth (or not) of a proposition. Pairs of such values, each normalized in ½0; 1, give a certainty factor CF ¼ MB  MD 2 ½1; 1. The patterns are connected implicitly by ^ and CFs annotate individual constraints (given a priori, by evaluation or multiple entailment) and rules alike: Modus Ponens applies the product between the CF of the premise and the rule’s. Partial results, which can but decrease during the evaluation of a single rule, are propagated only if superior to a threshold and merged using an operator which takes into account both the sign and the relative strength of the contributions. Example 3 (v1). The class constraints have CF ¼ 1: assume that the three constraints have CF 0.7, 0.5, and 0:8, while the rule has CF 0.6. _ was not used in MYCIN, but its natural definition is max , so the CF of the premise is maxfminf0:7; 0:5g; 0:8g ¼ 0:3. The conclusion, thus, has CF 0.3. If another rule had entailed the same conclusion with CF ¼ 0:4, the result would have been 0.7. Had another conclusion given 0:9, instead, the final CF would have been 6=7, which would stop the propagation.

4.3 Fuzzy Inference Fuzzy logic reasons with vague concepts, such as “tall,” “fast,” and “low.” Despite some philosophical controversy, it is widely used in many applications such as automated control, image processing, and pattern recognition [26]. In fuzzy systems, fuzzy sets are used in place of numeric values and domains are partitioned using a fixed, usually small, collection of reference sets, corresponding to “linguistic”concepts. For example, the age of a man (domain ½0::100), can be partitioned in fyoung,mature,oldg. Each set  can be considered a possibility distribution over the domain, stating how plausible it is for each value to be the real one, once it is known that it belongs to a given set. A fuzzy rule is typically used to approximate a

Suggest Documents