). Here t is a rif:iri constant and p is a term. The constant t indicates the location of another document to be imported and p is called the profile of import. Section III of this document defines the semantics for the directive Import() only. The twoargument directive, Import(
), is intended for importing non-RIF-BLD documents, such as rules from other RIF dialects, RDF data, or OWL ontologies. The profile, p, indicates what kind of entity is being imported and under what semantics (for instance, the various RDF entailment regimes have different profiles). The semantics of Import(
) (for various p) are expected to be given by other specifications on a case-by-case basis. For instance, [10] defines the semantics for the profiles that are recommended for importing RDF and OWL. A document formula contains at most one Base directive. If present, this directive comes first. It may
be followed by any number of Prefix directives, then by any number of Import directives. The next example illustrates a function that is defined using a recursive equality in the rule head. It shows that, as a Horn logic with equality, RIF-BLD subsumes the expressive power of first-order functional programming, where only oriented equations are used. While typical of functional-logic programming, this facility is currently out of scope for most logic rule languages. This, along with the previously discussed complexity and implementation issues with unrestricted equality, explains why the RIF Working Group has been cautious about including equality in rule heads in RIF-BLD. Example 2.1 (The factorial function): The rule set in this example has two formulas. One is an atomic equation and the other a rule with an equality in the conclusion. At first glance, the rule seems simple: it uses two built-in functions (designated with the External construct), func:numeric-multiply and func:numeric-subtract, and one built-in predicate, pred:numeric-greater-than. The rule does not even use recursive predicates and seems to be rather innocent. However, this first impression is deceptive, since the factorial function has a recursive occurrence within the equation in the rule head. Document ( Prefix(pred ) Prefix(func ) Prefix(xs ) Group ( _factorial("0"^^xs:integer) = "1"^^xs:integer Forall ?N ( _factorial(?N) = External(func:numeric-multiply( ?N _factorial( External(func:numeric-subtract( ?N "1"^^xs:integer))))) :External(pred:numeric-greater-than( ?N "0"^^xs:integer)) ) ... other rules here can use _factorial ... ) )
Semantically, this RIF-BLD document represents a world in which the following equalities are true: {_factorial("i"ˆˆxs:integer) = "j"ˆˆxs:integer | i ≥ 0, j = i!}.
The ostentatious underscore in front of the factorial function symbol is a shorthand for the constant "factorial"ˆˆrif:local. The use of the rif:local symbol space here indicates that the name of that function is local, i.e., it is not accessible from other documents. It can, however, be accessed in definitions of other functions or predicates within the same document. The semantics of local constants is given in Section III. D. RIF-BLD Annotations in the Presentation Syntax RIF-BLD allows every term and formula (including terms and formulas that occur inside other terms and formulas) to be optionally preceded by one annotation of the form (*
8
id ϕ *), where id is a rif:iri constant and ϕ is a frame formula or a conjunction of frame formulas. Both items inside the annotation are optional. The id part represents the identifier of the term or formula to which the annotation is attached and ϕ is the metadata part of the annotation. RIF-BLD does not impose any restrictions on ϕ apart from what is stated above. This means that ϕ may include variables, function symbols, constants from the symbol space rif:local, and so on. The purpose of annotations is to endow RIF with a general metadata facility, but semantically annotations are just comments; they are intended to be specified by document authors. The restrictions on annotations are intended to make their syntax simple and yet sufficient for supporting very general forms of metadata. Document formulas with and without annotations will be referred to as RIF-BLD documents. A convention is used to avoid syntactic ambiguity in the above definition. For instance, in (* id ϕ *) t[w -> v] the metadata annotation could be attributed to the term t or to the entire frame t[w -> v]. The convention is that the above annotation is considered to be syntactically attached to the entire frame. Yet, some slots of ϕ can be used to provide metadata targeted to the object part t of the frame. Here is an example: (* _md[meta_for_frame->"about frame"^^xs:string) meta_for_object->"about t"^^xs:string meta_for_properties->"about w"^^xs:string)] *) t[w -> v]
Generally, the convention associates each annotation to the largest term or formula it precedes. Although RIF does not prescribe any particular vocabulary for metadata, one can be expected to use Dublin Core,4 RDFS, and OWL properties for metadata, along the lines of Section 7.1 of the OWL recommendation [1]—specifically owl:versionInfo, rdfs:label, rdfs:comment, rdfs:seeAlso, rdfs:isDefinedBy, dc:creator, dc:description, dc:date, and foaf:maker. E. Well-formed Formulas Not all formulas or documents are well-formed in RIF-BLD. The well-formedness restriction is similar to standard firstorder logic: it is required that no constant appear in more than one context. We will not rehash here the tedious details of the definitions from [2], Section 2.5. Informally, unique context means that no constant symbol can occur within the same document as an individual, a function symbol, or a predicate in different places. Likewise, a symbol cannot occur as an external predicate or function in one place and as a normal predicate or function symbol in another. However, the same symbol can occur as a function symbol of different arities or as a predicate symbol of different arities. F. EBNF Grammar for the Presentation Syntax of RIF-BLD Until now, we have been using mathematical English to specify the syntax of RIF-BLD. Since tool developers might prefer a more succinct overview of the syntax using familiar grammar notation, the RIF-BLD specification also supplies an EBNF definition. However, one should keep in mind the 4 http://dublincore.org/
following limitations of EBNF with respect to the presentation syntax: • The syntax of first-order logic is not context-free, so EBNF cannot capture the syntax of RIF-BLD precisely. For instance, it cannot capture the requirement that all variables under a quantifier must be distinct. As a result, the EBNF grammar defines a proper superset of RIFBLD: not all formulas that are derivable using the EBNF productions are well-formed formulas in RIF-BLD. For implementing a strict syntax validator, the corresponding mathematical English definitions must be used. • The EBNF grammar does not address all details of how constants and variables are represented, and it is not sufficiently precise about precedence rules, delimiters, and escape symbols. White space is informally used as a delimiter, and is implied in productions that use Kleene star. For instance, TERM* is to be understood as TERM TERM ... TERM, where each space abstracts from one or more blanks, tabs, newlines, etc. The EBNF grammar for the RIF-BLD presentation syntax is as follows: Rule Language: Document ::= IRIMETA? 'Document' '(' Base? Prefix* Import* Group? ')' Base ::= 'Base' '(' ANGLEBRACKIRI ')' Prefix ::= 'Prefix' '(' Name ANGLEBRACKIRI ')' Import ::= IRIMETA? 'Import' '(' ANGLEBRACKIRI PROFILE? ')' Group ::= IRIMETA? 'Group' '(' (RULE | Group)* ')' RULE ::= (IRIMETA? 'Forall' Var+ '(' CLAUSE ')') | CLAUSE CLAUSE ::= Implies | ATOMIC Implies ::= IRIMETA? (ATOMIC | 'And' '(' ATOMIC* ')') ':-' FORMULA PROFILE ::= ANGLEBRACKIRI Condition Language: FORMULA ::= IRIMETA? 'And' '(' FORMULA* ')' | IRIMETA? 'Or' '(' FORMULA* ')' | IRIMETA? 'Exists' Var+ '(' FORMULA ')' | ATOMIC | IRIMETA? 'External' '(' Atom ')' ATOMIC ::= IRIMETA? (Atom | Equal | Member | Subclass | Frame) Atom ::= UNITERM UNITERM ::= Const '(' (TERM* | (Name '->' TERM)*) ')' Equal ::= TERM '=' TERM Member ::= TERM '#' TERM Subclass ::= TERM '##' TERM Frame ::= TERM '[' (TERM '->' TERM)* ']' TERM ::= IRIMETA? (Const | Var | Expr | 'External' '(' Expr ')') Expr ::= UNITERM Const ::= '"' UNICODESTRING '"ˆˆ' SYMSPACE | CONSTSHORT Name ::= UNICODESTRING Var ::= '?' UNICODESTRING SYMSPACE ::= ANGLEBRACKIRI | CURIE Annotations: IRIMETA
::= '(*' IRICONST? (Frame | 'And' '(' Frame* ')')? '*)'
The following subsections explain and illustrate the different parts of the syntax starting with the foundational language of positive conditions.
9
1) EBNF for the Condition Language: The Condition Language represents formulas that can be used as queries or in the premises of RIF-BLD rules. The production rule for the non-terminal FORMULA represents RIF condition formulas (defined earlier). The connectives And and Or define conjunctions and disjunctions of conditions, respectively. Exists introduces existentially quantified variables. Here Var+ stands for the list of variables that are free in FORMULA. RIF-BLD conditions permit only existential variables. A RIF-BLD FORMULA can also be an ATOMIC term, i.e., an Atom, External Atom, Equal, Member, Subclass, or Frame. A TERM can be a constant, variable, Expr, or External Expr. The presentation syntax does not commit to any particular vocabulary and permits arbitrary Unicode strings in constant symbols, argument names, and variables. Constant symbols have the following form: "UNICODESTRING"ˆˆSYMSPACE, where SYMSPACE is an ANGLEBRACKIRI or is a CURIE that represents the identifier of the symbol space of the constant. UNICODESTRING is a Unicode string from the lexical space of that symbol space. ANGLEBRACKIRI is an IRI enclosed in angle brackets, as in N3. For instance, . The CURIE syntax has been introduced earlier. Constant symbols can have several short forms, which are represented by the non-terminal CONSTSHORT and are defined in [15]. Apart from the CURIE notation for IRI constants and the underscore notation for local symbols (cf. _factorial in Example 2.1), other shortcuts exist for strings (e.g., "abc" for "abc"ˆˆxs:string) and integers (e.g., 123 for "abc"ˆˆxs:integer). Names are Unicode character sequences. Variables are composed of UNICODESTRING symbols prefixed with a ?-sign. Equality, membership, and subclass terms are selfexplanatory. An Atom and Expr (expression) can either be positional or have named arguments. A frame term is a term composed of an object Id and a collection of attribute-value pairs. The term External(Atom) represents invocations of externally defined predicates, such as built-ins. Likewise, External(Expr) represents invocations of externally defined functions. Example 2.2 (RIF-BLD conditions): This example shows conditions that are composed of atoms, expressions, frames, and existentials. In frame formulas, variables are shown in the positions of object Ids, object properties, and property values. When convenient, we take advantage of the CURIE notation and other shortcuts. Prefix(bks ) Prefix(auth ) Prefix(cpt ) Formulas that use positional terms: cpt:book(auth:rifwg bks:Hamlet) Exists ?X (cpt:book(?X bks:Hamlet)) Formulas that use terms with named arguments: cpt:book(cpt:author->auth:rifwg cpt:title->bks:Hamlet) Exists ?X (cpt:book(cpt:author->?X cpt:title->bks:Hamlet)) Formulas that use frames: bks:wd1[cpt:author->auth:rifwg cpt:title->bks:Hamlet] Exists ?X (bks:wd2[cpt:author->?X cpt:title->bks:Hamlet])
Exists ?X (And (bks:wd2#cpt:book bks:wd2[cpt:author->?X cpt:title->bks:Hamlet])) Exists ?I ?X (?I[cpt:author->?X cpt:title->bks:Hamlet]) Exists ?I ?X (And (?I#cpt:book ?I[cpt:author->?X cpt:title->bks:Hamlet])) Exists ?S (bks:wd2[cpt:author->auth:rifwg ?S->bks:Hamlet]) Exists ?X ?S (bks:wd2[cpt:author->?X ?S->bks:Hamlet]) Exists ?I ?X ?S (And (?I#cpt:book ?I[author->?X ?S->bks:Hamlet]))
2) EBNF for the Rule Language: The EBNF for RIFBLD rules and documents is given in Section II-F. A RIFBLD Document consists of an optional Base directive, followed by any number of Prefixes and then any number of Imports. These then may be followed by an optional Group. Base and Prefix are employed by the shortcut mechanisms for IRIs. IRI has the form of an internationalized resource identifier as defined by [14]. An Import directive indicates the location of a document to be imported and an optional profile. A RIF-BLD Group is a collection of any number of RULE elements along with any number of nested Groups. Rules are generated using CLAUSE elements. The RULE production has two alternatives: • In the first, a CLAUSE is in the scope of the Forall quantifier. In that case, all variables mentioned in CLAUSE are required to also appear among the variables in the Var+ sequence. • In the second alternative, CLAUSE appears on its own. In that case, CLAUSE cannot have variables. Var, ATOMIC, and FORMULA were defined as part of the syntax for positive conditions in Section II-F1. In the CLAUSE production, ATOMIC is what is usually called a fact. An Implies rule can have an ATOMIC element or a conjunction of ATOMIC elements as its conclusion; it has a FORMULA as its premise. Note that, by Definition 2.4, externally defined atoms (i.e., formulas of the form External(Atom)) are not allowed in the conclusion part of a rule (ATOMIC does not expand to External). Example 2.3 (RIF-BLD rules): This example shows a business rule from [21]. If an item is perishable and it is delivered to John more than 10 days after the scheduled delivery date then the item should be rejected by him. Again, we employ various shortcuts, including the compact URI notation. Two versions of the main part of the document are given. In the first, all variables are quantified universally outside of the rule. In the second, some variables are quantified existentially in the rule premise. Semantically, both versions are equivalent according to Section III. Prefix(ppl ) Prefix(cpt ) Prefix(func ) Prefix(pred ) a. Universal form: Forall ?item ?deliverydate ?scheduledate ?diffduration ?diffdays ( cpt:reject(ppl:John ?item):And(cpt:perishable(?item) cpt:delivered(?item ?deliverydate ppl:John) cpt:scheduled(?item ?scheduledate) ?diffduration = External(func:subtract-dateTimes(
10
?deliverydate ?scheduledate)) ?diffdays = External(func:days-from-duration( ?diffduration)) External(pred:numeric-greater-than(?diffdays 10))) ) b. Universal-existential form: Forall ?item ( cpt:reject(ppl:John ?item ):Exists ?deliverydate ?scheduledate ?diffduration ?diffdays ( And(cpt:perishable(?item) cpt:delivered(?item ?deliverydate ppl:John) cpt:scheduled(?item ?scheduledate) ?diffduration = External(func:subtract-dateTimes( ?deliverydate ?scheduledate)) ?diffdays = External(func:days-from-duration( ?diffduration)) External(pred:numeric-greater-than(?diffdays 10))) )
3) EBNF for Annotations: The EBNF grammar for RIFBLD annotations is shown in Section II-F. As explained in Section II-D, each RIF-BLD formula and term can be prefixed with one optional annotation, IRIMETA, for identification and metadata. IRIMETA is represented as (*...*)-bracketed blocks that contain an optional rif:iri constant identifier, IRICONST, followed by an optional Frame or conjunction of Frames as metadata. Example 2.4 (A RIF-BLD document with annotated group): This example shows a complete document containing a group formula that consists of two RIF-BLD rules. The first is Rule (a) from Example 2.3. The group is annotated with an IRI identifier and metadata that uses Dublin Core vocabulary. Document( Prefix(ppl ) Prefix(cpt ) Prefix(dc ) ... func and pred as in Example 2.3 ... Prefix(xs ) (* "http://sample.org"ˆˆrif:iri _pd[dc:publisher -> "http://www.w3.org/"ˆˆrif:iri dc:date -> "2008-04-04"ˆˆxs:date] *) Group ( Forall ?item ?deliverydate ?scheduledate ?diffduration ?diffdays ( cpt:reject(ppl:John ?item):... body of Rule (a) in Example 2.3 ... Forall ?item ( cpt:reject(ppl:Fred ?item):- cpt:unsolicited(?item) ) ) )
III. S EMANTICS Like the presentation syntax, the semantics of RIF-BLD is given in two ways: as a specialization from the semantics of RIF-FLD [4] and as a direct specification from scratch. Here, again, we choose the second method. Although much longer, it requires no familiarity with RIF-FLD. However, the actual presentation of the semantics in this section and in the official specification [2] is very much in the style of RIF-FLD, which is considerably more general than what is actually required for such a relatively simple rule language as the Basic Logic Dialect. The reason for this generality is the need to ensure that the semantics do not diverge and that future RIF logic dialects will be proper extensions of the basic dialect.
In defining the semantics, we employ only the full syntax of RIF-BLD. The various shortcuts permitted by the syntax (e.g., the Prefix and Base directives and the CURIEs) are assumed to be already expanded into their full form by a simple preprocessor. To save space, in describing the semantics we omit lists and datatypes, and we simplify the semantics of external functions and predicates. A. Semantic Structures The semantics of RIF-BLD is an adaptation of the standard semantics for Horn clauses. Although this semantics is specified using general models while the semantics for Horn clauses is usually given using Herbrand models, the two styles of semantics are known to be equivalent [22]. We will use TV to denote {t,f}—the set of truth values used in the semantics. TV is used in RIF because it is intended to address (through the RIF-FLD framework) a range of logic languages, including those that are based on multi-valued logics. Since RIF-BLD is based on the classical two-valued logic, its TV set is particularly simple. The key concept in a model-theoretic semantics for a logic language is the notion of semantic structures [23], which is defined next. Definition 3.1 (Semantic structure): A semantic structure, I, is a tuple of the form . Here D is a nonempty set of elements called the domain of I, and Dind , Dfunc are nonempty subsets of D. Dind is used to interpret the elements of Const that play the role of individuals and Dfunc is used to interpret the constants that play the role of function symbols. As before, Const denotes the set of all constant symbols and Var the set of all variable symbols. DTS denotes a set of identifiers for primitive datatypes, which are mostly borrowed from XML Schema [24]. These include datatypes such as xs:string and xs:integer. A full description of supported datatypes is found in [15]. To preserve the main focus of this article, we will omit the semantics of datatypes in this exposition. The remaining components of I are total mappings defined as follows: 1) IC maps Const to D. This mapping interprets constant symbols. In addition: • If a constant, c ∈ Const, is an individual then it is required that IC (c) ∈ Dind . • If c ∈ Const, is a function symbol (positional or with named arguments) then it is required that IC (c) ∈ Dfunc . 2) IV maps Var to Dind . This mapping interprets variable symbols. 3) IF maps D to total functions D*ind → D (here D*ind is a set of all finite sequences over the domain Dind ). This mapping interprets positional terms. In addition: • If d ∈ Dfunc then IF (d) must be a function D*ind → Dind . • This implies that when a function symbol is applied to arguments that are individual objects then the result is also an individual object.
11
4) INF maps D to the set of total functions SetOfFiniteSets(ArgNames × Dind ) → D. This mapping interprets function symbols with named arguments. It is also required that: • If d ∈ Dfunc then INF (d) must be a function SetOfFiniteSets(ArgNames × Dind ) → Dind . The mapping INF is analogous to the interpretation of positional terms via the mapping IF with two differences: • Each pair ∈ ArgNames × Dind represents an attribute-value pair instead of just a value in the case of a positional term. • The arguments of a term with named arguments constitute a finite set of attribute-value pairs rather than a finite ordered sequence of simple elements. So, the order of the arguments in named-argument terms is immaterial. 5) Iframe maps Dind to total functions of the form SetOfFiniteBags(Dind × Dind ) → D. This mapping interprets frame terms. An argument d ∈ Dind of Iframe represents an object, and {, ..., } is a finite bag of attribute-value pairs for d. We will see shortly how Iframe is used to determine the truth valuation of frame terms. Bags (multi-sets) are used here because the order of the attribute-value pairs in a frame is immaterial and pairs may repeat, e.g., o[a->b a->b]. Such repetitions arise naturally when variables are instantiated with constants. For instance, o[?A->?B ?C->?D] becomes o[a->b a->b] if variables ?A and ?C are instantiated with the symbol a and ?B, ?D with b. (We shall see later that o[a->b a->b] is actually equivalent to o[a->b].) 6) Isub gives meaning to the subclass relationship. It is a total mapping of the form Dind × Dind → D. An additional restriction in Section III-C ensures that the operator ## is transitive, i.e., that c1 ## c2 and c2 ## c3 imply c1 ## c3. 7) Iisa gives meaning to class membership. It is a total mapping of the form Dind × Dind → D. An additional restriction in Section III-C ensures that the relationships # and ## have the usual property that all members of a subclass are also members of the superclass, i.e., that o # cl and cl ## scl imply o # scl. 8) I= is a mapping of the form Dind × Dind → D. It gives meaning to the equality operator. 9) Itruth is a mapping of the form D → TV. It is used to define truth valuation for formulas. 10) Iexternal is a mapping that is used to give meaning to External terms. It maps symbols in Const designated as external to fixed functions of appropriate arity. Typically, external terms are invocations of builtin functions or calls to external sources, and their fixed interpretations are determined by the specification of those built-ins and external sources. We also define the following generic mapping from terms to D, which we denote by I.
• • • •
•
• • • •
I(k) = IC (k), if k is a symbol in Const I(?v) = IV (?v), if ?v is a variable in Var I(f(t1 ... tn )) = IF (I(f))(I(t1 ),...,I(tn )) I(f(s1 ->v1 ... sn ->vn )) = INF (I(f))({,...,}) Here we use {...} to denote a set of attribute-value pairs. I(o[a1 ->v1 ... ak ->vk ]) = Iframe (I(o))({, ..., }) Here {...} denotes a bag of attribute-value pairs. Jumping ahead, we note that duplicate elements in such a bag do not affect the value of Iframe (I(o))—see Section III-C. For instance, I(o[a->b a->b]) = I(o[a->b]). I(c1##c2) = Isub (I(c1), I(c2)) I(o#c) = Iisa (I(o), I(c)) I(x=y) = I= (I(x), I(y)) I(External(p(s1 ... sn ))) = Iexternal (p)(I(s1 ), ..., I(sn )).
In addition, RIF-BLD imposes certain restrictions on datatypes so that they would be interpreted as intended (for instance, that the constants in the symbol space xs:integer are interpreted by integers). Details can be found in [2]. B. RIF-BLD Annotations in the Semantics RIF-BLD annotations are stripped before the mappings that constitute RIF-BLD semantic structures are applied. Likewise, they are stripped before applying the truth valuation, TValI , defined in the next section. Thus, annotations have no effect on the formal semantics of this Horn logic variant. Note that, although identifiers and metadata associated with RIF-BLD formulas are ignored by the semantics, they can be extracted by XML tools. The frame terms used to represent RIF-BLD metadata can then be combined with RIF-BLD rules and fed into rule reasoners. In this way, applications can be built to reason about metadata. However, RIF itself defines no particular semantics for metadata. C. Interpretation of Non-document Formulas This section establishes how semantic structures determine the truth value of RIF-BLD formulas other than document formulas. Truth valuation of document formulas is defined in the next section. Here we define a mapping, TValI , from the set of all non-document formulas to TV. Observe that in case of atomic formulas, TValI (φ) is defined essentially as Itruth (I(φ)). Recall that I(φ) is just an element of the domain D and Itruth maps D to truth values in TV. This might seem strange to the reader who is used to textbook-style definitions, since normally the mapping I is defined only for terms that occur as arguments to predicates, not for atomic formulas. Similarly, truth valuations are usually defined via mappings from instantiated formulas to TV, not from the interpretation domain D to TV. This HiLog-style definition [6] is inherited from RIF-FLD [4] and is equivalent to a standard one for first-order languages such as RIF-BLD. In RIF-FLD, this style of definition is a provision for enabling future RIF dialects that support higher-order features, such as those of HiLog [6], FLORA-2 [25], [26], and others.
12
Definition 3.2 (Truth valuation): Truth valuation for wellformed formulas in RIF-BLD is determined using the following function, denoted TValI : 1) Positional atomic formulas: TValI (r(t1 ... tn )) = Itruth (I(r(t1 ... tn ))) 2) Atomic formulas with named arguments: TValI (p(s1 ->v1 ... sk ->vk )) = Itruth (I(p(s1 ->v1 ... sk ->vk ))). 3) Equality: TValI (x = y) = Itruth (I(x = y)). • To ensure that equality has precisely the expected properties, it is required that: Itruth (I(x = y)) = t if I(x) = I(y) and that Itruth (I(x = y)) = f otherwise. • This is tantamount to saying that TValI (x = y) = t if and only if I(x) = I(y). 4) Subclass: TValI (sc ## cl) = Itruth (I(sc ## cl)). To ensure that ## is transitive, i.e., c1 ## c2 and c2 ## c3 imply c1 ## c3, the following is required: • For all c1, c2, c3 ∈ D, if TValI (c1 ## c2) = TValI (c2 ## c3) = t then TValI (c1 ## c3) = t. 5) Membership: TValI (o # cl) = Itruth (I(o # cl)). To ensure that all members of a subclass are also members of its superclasses, i.e., o # cl and cl ## scl imply o # scl, the following restriction is imposed: • For all o, cl, scl ∈ D, if TValI (o # cl) = TValI (cl ## scl) = t then TValI (o # scl) = t. 6) Frame: TValI (o[a1 ->v1 ... ak ->vk ]) = Itruth (I(o[a1 ->v1 ... ak ->vk ])). Since the bag of attribute-value pairs represents a conjunction of all the pairs, the following restriction is used, if k > 0: • TValI (o[a1 ->v1 ... ak ->vk ]) = t if and only if TValI (o[a1 ->v1 ]) = ... = TValI (o[ak ->vk ]) = t. 7) Externally defined atomic formula: TValI (External(t)) = Itruth (Iexternal (t)). 8) Conjunction: TValI (And(c1 ... cn )) = t if and only if TValI (c1 ) = ... = TValI (cn ) = t. Otherwise, TValI (And(c1 ... cn )) = f. The empty conjunction is treated as a tautology: TValI (And()) = t. 9) Disjunction: TValI (Or(c1 ... cn )) = f if and only if TValI (c1 ) = ... = TValI (cn ) = f. Otherwise, TValI (Or(c1 ... cn )) = t. The empty disjunction is treated as a contradiction: TValI (Or()) = f. 10) Quantification: • TValI (Exists ?v1 ... ?vn (ϕ)) = t if and only if for some I*, described below, TValI∗ (ϕ) = t. • TValI (Forall ?v1 ... ?vn (ϕ)) = t if and only if for every I*, described below, TValI∗ (ϕ) = t. Here I* is a semantic structure of the form , which is exactly like I, except that the mapping I*V , is used instead of IV . I*V is defined to
coincide with IV on all variables except, possibly, on ?v1 ,...,?vn . 11) Rule implication: • TValI (conclusion :- condition) = t, if either TValI (conclusion)=t or TValI (condition)=f. • TValI (conclusion :- condition) = f otherwise. 12) Groups of rules: If Γ is a group formula of the form Group(ϕ1 ... ϕn ) then • TValI (Γ) = t if and only if TValI (ϕ1 ) = t, ..., TValI (ϕn ) = t. • TValI (Γ) = f otherwise. In other words, rule groups are treated as conjunctions. D. Interpretation of Documents RIF documents are sets of rules and are similar to Group formulas in that respect. However, they can also import other documents. This feature requires special care, since the imported documents can have rif:local constants and there might be name clashes. This would be a problem, if we recall that rif:local constants are completely encapsulated within the documents they appear in, so the same local constant may mean different things in different documents. This property of local constants brings the notion of semantic multi-structures into the picture. Semantic multi-structures are essentially similar to regular semantic structures but, in addition, they allow to interpret rif:local symbols that belong to different documents differently. Definition 3.3 (Semantic multi-structure): A semantic multi-structure ˆI is a set {J ,I;I ϕ1 , ..., I ϕn , ...} of semantic structures adorned with distinct RIF-BLD formulas ϕ1 , ..., ϕn (adorned structures can be thought of as formula-structure pairs). These structures must be identical in all respects except that the mappings JC , IC , IC ϕ1 , ..., IC ϕn , ..., which interpret the constants in Const by domain elements in D, may differ on the constants that belong to the rif:local symbol space. As will become clear shortly, the above I serves to interpret the main document formulas. The adorned structures of the form I ϕi are used to interpret imported documents, and J is used in defining entailment for non-document formulas. We can now define the semantics of RIF documents. Definition 3.4 (Truth valuation for document formulas): Let ∆ be a document formula and let ∆1 , ..., ∆k be all the RIF-BLD document formulas that are imported directly or indirectly5 into ∆. Let Γ, Γ1 , ..., Γk denote the respective group formulas associated with these documents. Let ˆI = {J , I; I ∆1 , ..., I ∆k , ...} be a semantic multi-structure that contains semantic structures adorned with at least the documents ∆1 , ..., ∆k . Then we define: • TValˆI (∆) = t if and only if TValI (Γ) = TValI ∆1 (Γ1 ) = ... = TValI ∆1 (Γk ) = t. Note that this definition considers only those document formulas that are reachable via the one-argument import 5 That
is, through another imported document.
13
directives. Two argument import directives are not covered here. Their semantics is defined by the document RIF RDF and OWL Compatibility [10]. Also note that some of the Γi above may be missing since all parts in a document formula are optional. In this case, we assume that Γi is a tautology, such as a = a, and every TVal function maps such a Γi to the truth value t. For non-document formulas, we extend TVal from regular semantic structures to multi-structures as follows: if ˆI = {J , I; ...} is a multi-structure and φ a non-document formula then TValˆI (ϕ) = TValJ (ϕ). The above definitions make the intent behind rif:local constants clear: occurrences of such constants in different documents can be interpreted differently even if they have the same name. Therefore, each document can choose the names for its rif:local constants freely and without any risk of clashes with the names of local constants used in the imported documents. Hence rif:local constants are just a syntactic convenience introduced to facilitate rule interchange. This language feature, does not increase the expressive power of RIF-BLD, as such constants can be simulated in any multidocument rule language by simply renaming apart certain designated constants in different documents. E. Logical Entailment We can now define what it means for a set of RIF-BLD rules (embedded in a group or a document formula) to entail another RIF-BLD formula. In RIF-BLD we are mostly interested in entailment of RIF condition formulas, which can be viewed as queries to RIF-BLD documents. Entailment of condition formulas provides formal underpinning to RIF-BLD queries. Definition 3.5 (Models): A multi-structure ˆI is a model of a formula, ϕ, written as ˆI |= ϕ, iff TValˆI (ϕ) = t. Here ϕ can be a document or a non-document formula. Definition 3.6 (Logical entailment): Let ϕ and ψ be (document or non-document) formulas. We say that ϕ entails ψ, written as ϕ |= ψ, if and only if for every multi-structure ˆI for which both TValˆI (ϕ) and TValˆI (ψ) are defined, ˆI |= ϕ implies ˆI |= ψ. One consequence of the multi-document semantics of RIF-BLD is that local constants specified in one document cannot be queried from another document. For instance, if one document, ∆0 , has the fact "http://eg.com/p"ˆˆrif:iri("abc"ˆˆrif:local)
while another document formula, ∆, imports ∆0 and has the rule "http://eg.com/q"ˆˆrif:iri(?X) :"http://eg.com/p"ˆˆrif:iri(?X)
then ∆ |= "http://eg.com/q"ˆˆrif:iri("abc"ˆˆrif:local) does not hold. This is because the symbol "abc"ˆˆrif:local in ∆0 and in ∆ is treated as occurrences of different constants by semantic multi-structures. The behavior of local symbols should be contrasted with that of rif:iri symbols, which are global. For instance, in the above scenario, let ∆0 also have the fact
"http://eg.com/p"ˆˆrif:iri("http://cde"ˆˆrif:iri)
Then ∆ |= does hold.
"http://eg.com/q"ˆˆrif:iri("http://cde"ˆˆrif:iri)
IV. XML S ERIALIZATION The goal of XML serialization of any RIF dialect, including BLD, is to provide a standard way of encoding the presentation syntax in XML so that the rules could be transmitted over the wire and exchanged among processors. The serialization of RIF-BLD includes the following components [2]: • An XML schema document for the XML syntax. • A mapping χbld from the RIF-BLD presentation syntax to XML. The purpose of the XML schema document is obvious: as with any XML schema document, it defines which XML documents purporting to contain RIF rules are valid—a useful syntactic check. However, this is not enough. First, the semantics of RIF-BLD was specified using the presentation syntax because doing so directly with XML is infeasible. XML is too verbose and even the simplest of definitions of the kind we saw in Section III would span pages. The mapping χbld from the presentation syntax to XML is an indirect way to project the semantics on RIF-BLD’s XML documents. Second, recall that the syntax of RIF-BLD is not context-free and thus cannot be fully captured by EBNF. This limitation also applies to XML Schema. Again, the mapping χbld solves the problem, as the image of that mapping (which is a subset of all documents valid with respect to the XML schema) is precisely the set of all syntactically correct XML serializations of RIF-BLD. To reflect this situation, we introduce two notions of syntactic correctness. The weaker notion checks correctness only with respect to XML Schema, while the stricter notion represents true syntactic correctness. A valid RIF-BLD document in XML syntax is an XML document that is valid with respect to the XML schema of RIF-BLD as defined in [2]. An admissible RIF-BLD document in XML syntax is a valid RIF-BLD document in XML syntax that is the image of a wellformed RIF-BLD document in the presentation syntax under the presentation-to-XML syntax mapping χbld . Due to the lack of space, we cannot define here the XML schema of RIF-BLD or the mapping χbld . Instead, we will explain the general principles of the XML serialization. The XML serialization for RIF-BLD is based on an alternating (also known as striped) syntax [27]. A striped serialization views XML documents as objects and divides all XML tags into class descriptors, called type tags, and property descriptors, called role tags [11]. We follow the convention of using capitalized names for type tags and lowercase names for role tags. The all-uppercase classes in the presentation syntax, such as FORMULA, become XML Schema groups. They are not visible in instance markup. The other classes as well as non-terminals and symbols (e.g., Exists or =) become XML elements with optional attributes, as shown below. Details can be found in the XML schema appendix of the RIF-BLD specification [2].
14
In the XML serialization of RIF-BLD some of the tag names directly reflect keywords of the presentation syntax. For instance, And(...) is serialized as .... Children of such type tags are wrapped into role tags. For instance, all children of the And type tag are formula role tags: ... ....... Other tag names reflect the binary infix operators of the presentation syntax, such as # or =. For instance, ...#... is serialized as .... As before, child elements of the Member type tag are role tags—instance and class role tags in this case: ... .... The tags used in the serialization of the condition language are listed below. (conjunction) (disjunction) (quantified formula for 'Exists', containing declare and formula roles) declare (declare role, containing a Var) formula (formula role, containing a FORMULA) Atom (atom formula, positional or with named arguments) External (external call, containing a content role) content (content role, containing an Atom, for predicates, or Expr, for functions) Member (prefix version of member formula '#') Subclass (prefix version of subclass formula '##') Frame (Frame formula) object (Member/Frame role, containing a TERM or an object description) op (Atom/Expr role for predicates/functions as operations) args (Atom/Expr positional arguments role, with fixed 'ordered' attribute, containing n TERMs) instance (Member instance role) class (Member class role) sub (Subclass sub-class role) super (Subclass super-class role) slot (prefix version of Name/TERM '->' TERM pair as an Atom/Expr or Frame slot role, with fixed 'ordered' attribute) Equal (prefix version of term equation '=') Expr (expression formula, positional or with named arguments) left (Equal left-hand side role) right (Equal right-hand side role) Const (individual, function, or predicate symbol, with optional 'type' attribute) Name (name of named argument) Var (serialized version of logic '?' variable)
- And - Or - Exists -
- id - meta
(identifier role, containing IRICONST) (meta role, containing metadata as a Frame or Frame conjunction)
The id and meta elements, which are expansions of the IRIMETA element, can occur optionally as the first two children of any type (but not role) element. The XML syntax for symbol spaces uses the type attribute associated with the XML element Const. For instance, a literal in the xs:dateTime datatype is represented as 2007-11-23T03:55:44-02:30. RIF-BLD also uses the ordered attribute to indicate that the children of args and slot elements are ordered. The tags for the rule sublanguage also come in two flavors: those that correspond to the presentation syntax keywords (e.g., Document) and those corresponding to the infix operators (e.g., :- becomes Implies). - Document
(document, containing optional directive and payload roles)
-
directive payload Import location profile Group sentence Forall
- Implies - if - then
(directive role, containing Import) (payload role, containing Group) (importation, containing location and optional profile) (location role, containing IRICONST) (profile role, containing PROFILE) (nested collection of sentences) (sentence role, containing RULE or Group) (quantified formula for 'Forall', containing declare and formula roles) (prefix version of logic ':-' implication, containing if and then roles) (antecedent role, containing FORMULA) (consequent role, containing ATOMIC or conjunction of ATOMICs)
The following example illustrates XML serialization. Example 4.1 (Serialization of the introductory example): The following XML document is a serialization of Example 1.1 from the introductory section. ]> Borrower Item Lender &cpt;lend Lender Item Borrower &cpt;borrow Borrower Item Lender &cpt;lend Pete &bks;Hamlet Beth
15
V. C ONFORMANCE The RIF-BLD specification does not require or expect conformant systems to implement the RIF-BLD presentation syntax. Instead, conformance is described in terms of semanticspreserving transformations between the native syntax of a compliant syntax and the XML syntax of RIF-BLD. Let T be a set of datatypes that includes the mandatory RIF datatypes defined in [15], and suppose E is a set of external terms that includes the mandatory built-ins listed in [15]. We say that a formula ϕ is a BLDT ,E formula iff • it is a well-formed RIF-BLD formula, • all the datatypes used in ϕ are in T , and • all the externally defined terms used in ϕ are in E. A RIF processor is a software program that can produce admissible RIF documents (a RIF producer) or consume them (a RIF consumer). Intuitively, a conformant RIF-BLD processor is one that can interpret φ correctly. The formal definition is as follows. Definition 5.1 (Conformant processor): A RIF processor is a conformant BLDT ,E consumer if and only if it implements a semantics-preserving mapping, µ, from the set of all BLDT ,E formulas to the language L of the processor (µ does not need to be an “onto” mapping). Specifically, this means that for any pair ϕ, ψ of BLDT ,E formulas for which ϕ |=BLD ψ is defined, ϕ |=BLD ψ if and only if µ(ϕ) |=L µ(ψ). Here |=BLD denotes the logical entailment in RIF-BLD and |=L is the logical entailment in the language L of the RIF processor. A RIF processor is a conformant BLDT ,E producer if and only if it implements a semantics-preserving mapping, ν, from the language L of the processor to the set of all BLDT ,E formulas (ν does not need to be an “onto” mapping). Specifically, this means that for any pair ϕ, ψ of formulas in L for which ϕ |=L ψ is defined, ϕ |=L ψ if and only if ν(ϕ) |=BLD ν(ψ). Thus, compliance with RIF-BLD means implementation of a conformant RIF consumer, producer, or both. In addition to the above, RIF-BLD imposes the following requirements: • Conformant BLD producers and consumers are required to support only the entailments of the form ϕ |=BLD ψ, where ψ is a closed RIF condition formula, i.e., a RIF condition in which every variable, ?V, is in the scope of a quantifier of the form Exists ?V. Conformant BLD producers and consumers are also expected to preserve annotations wherever possible. • Two other conformance requirements are still under discussion in the Working Group and might be removed in the final W3C recommendation. To understand them, recall that in Definition 5.1 the set of supported datatypes and built-ins must include the set of mandatory types and built-ins of RIF-BLD. The requirements under discussion state that conformant processors must also have a “strict” mode in which documents that refer to additional, nonmandatory datatypes and built-ins are rejected. The first restriction above simply says that rule engines are required to support querying but not more general types of
logical entailment. The strictness requirement is intended to allow RIF-BLD consumers to reject documents that contain datatypes and built-ins that are not defined by RIF and thus might not be understood by conformant processors. RIF-BLD supports a wide variety of syntactic forms for terms and formulas, which creates infrastructure for exchanging syntactically diverse rule languages. The conformance clauses enable systems that do not support some of the syntax directly to still support it through syntactic transformations. For instance, disjunctions in rule premises can be eliminated through a standard transformation, such as replacing p :Or(q r) with a pair of rules p :- q, p :- r. Likewise, terms with named arguments can be reduced to positional terms by ordering the arguments lexicographically by their names and incorporating the ordered argument names into the predicate name. For instance, p(bb->1 aa->2) could be represented as p_aa_bb(2 1). VI. C ONCLUSIONS RIF-BLD was the first specification that the W3C RIF Working Group approved for the “Candidate Recommendation” status. Prior to reaching that point, the specification had evolved with often dramatic twists and turns, e.g., inclusion and later exclusion of a sorted logic. RIF-BLD served as a basis for generalizations (RIF-FLD [4]) and specializations (RIF-Core). It has also been the focus of the RDF and OWL Compatibility document [10]. Other documents that have reached the Candidate Recommendation stage include RIF-PRD, the Production Rules Dialect [3], and RIF Datatypes and Built-ins [15]. A logically complete implementation of RIF-BLD consumer and producer processors requires the handling of unrestricted equality in rule conclusions. Theorem provers implementing the RIF-BLD superset of first-order logic with equality can be used as initial mapping targets (e.g., VampirePrime [28]). For better performance, they can then be specialized toward RIFBLD. Conversely, implementations of subsets of RIF-BLD that restrict equality in various ways are promising, as several systems already support much of the needed functionality (e.g., FLORA-2 [26] and Ontobroker [29]). Missing are mostly the actual mappings from those systems to and from RIF-BLD’s XML. Extensions of RIF-BLD and RIF-Core of the highest practical value are expected to be logic programming dialects with negation as failure. Such dialects, based on the well-founded [30] and answer-set [31] semantics for negation, are expected to be submitted to W3C by RIF user groups before the end of 2009. These new dialects are being defined as specializations of the general framework for logic dialects, RIF-FLD, which contains most of the requisite machinery and thus greatly simplifies the task. Since negation-as-failure constructs are widespread in industry and many rule engines provide them, this is expected to contribute to wider adoption of RIF. More importantly, it is hoped that the foundation provided by BLD as the first RIF dialect and FLD as an extensible RIF framework will spur the creation of a general infrastructure for exchanging rule languages through dialects specified in a
16
consistent manner with a presentation syntax, a formal semantics, an XML serialization, and conformance clauses. The RIF Working Group currently lists VampirePrime [28] and SILK6 as two of the six on-going BLD implementations, along with three PRD, five DTB, and one Core-only implementation.7 VII. ACKNOWLEDGEMENTS This article is based on the document [2] by the Rules Interchange Format (RIF) Working Group. The authors extend special thanks to Jos de Bruijn, Gary Hallmark, Sandro Hawke, David Hirtle, Stella Mitchell, Leora Morgenstern, Igor Mozetic, Axel Polleres, Dave Reynolds, Christian de SainteMarie, and Chris Welty for contributing ideas, feedback, and insightful discussions. Our thanks go also to the IEEE TKDE reviewers, whose suggestions led to improvements of both this article and the RIF-BLD document. R EFERENCES [1] M. Dean and G. Schreiber, “OWL Web Ontology Language Reference,” February 2004, http://www.w3.org/TR/owl-ref/. [2] H. Boley and M. Kifer, “RIF Basic Logic Dialect,” March 2009, W3C Working Draft. http://www.w3.org/2005/rules/wiki/BLD. [3] C. de Sainte Marie, A. Paschke, and G. Hallmark, “RIF Production Rule Dialect,” March 2009, W3C Working Draft. http://www.w3.org/ 2005/rules/wiki/PRD. [4] H. Boley and M. Kifer, “RIF Framework for Logic Dialects,” March 2009, W3C Working Draft. http://www.w3.org/2005/rules/wiki/FLD. [5] M. Kifer, G. Lausen, and J. Wu, “Logical Foundations of ObjectOriented and Frame-Based Languages,” Journal of ACM, vol. 42, pp. 741–843, July 1995. [6] W. Chen, M. Kifer, and D. Warren, “HiLog: A Foundation for HigherOrder Logic Programming,” Journal of Logic Programming, vol. 15, no. 3, pp. 187–230, February 1993. [7] I. Horrocks, P. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, and M. Dean, “SWRL: A Semantic Web Rule Language Combining OWL and RuleML,” W3C Member Submission, May 2004, http://www.w3. org/Submission/2004/SUBM-SWRL-20040521/. [8] T. Berners-Lee, D. Connolly, L. Kagal, Y. Scharf, and J. Hendler, “N3Logic: A Logical Framework For the World Wide Web,” Theory and Practice of Logic Programming (TPLP), vol. 8, no. 3, May 2008. [9] A. Analyti, G. Antoniou, and C. V. Damásio, “A Principled Framework for Modular Web Rule Bases and Its Semantics,” in KR. AAAI Press, 2008, pp. 390–400. [10] J. de Bruijn, “RIF RDF and OWL Compatibility,” March 2009, latest version available at http://www.w3.org/2005/rules/wiki/SWC. [11] H. Boley, “Object-Oriented RuleML: User-Level Roles, URI-Grounded Clauses, and Order-Sorted Terms,” in RuleML, ser. Lecture Notes in Computer Science. Springer, October 2003, no. 2876, pp. 1–16, http: //iit-iti.nrc-cnrc.gc.ca/publications/nrc-46502_e.html. [12] G. Wagner, A. Giurca, and S. Lukichev, “A General Markup Framework for Integrity and Derivation Rules,” in Principles and Practices of Semantic Web Reasoning, ser. Dagstuhl Seminar Proceedings, vol. 05371. Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss Dagstuhl, Germany, 2005. [13] C. Chang and R. Lee, Symbolic Logic and Mechanical Theorem Proving. Academic Press, 1973. [14] M. Duerst and M. Suignard, “Internationalized Resource Identifiers (IRIs),” January 2005, http://www.ietf.org/rfc/rfc3987.txt. [15] A. Polleres, H. Boley, and M. Kifer, “RIF Datatypes and Built-ins 1.0,” March 2009, W3C Working Draft. http://www.w3.org/2005/rules/wiki/ DTB. [16] “XQuery 1.0 and XPath 2.0 Functions and Operators,” W3C Recommendation, January 2007, http://www.w3.org/TR/xpath-functions/. [17] M. Birbeck and S. McCarron, “CURIE Syntax 1.0: A Syntax for Expressing Compact URIs,” April 2008, W3C Working Draft. Available at http://www.w3.org/TR/curie/. 6 http://silk.semwebcentral.org 7 http://www.w3.org/2005/rules/wiki/Implementations
[18] M. Kifer, “Rule Interchange Format: The Framework,” in Web Reasoning and Rule Systems, Second International Conference (RR 2008), Karlsruhe, Germany, October 31-November 1, 2008. Proceedings, ser. Lecture Notes in Computer Science, D. Calvanese and G. Lausen, Eds., vol. 5341. Springer, 2008, pp. 1–11. [19] G. Klyne and J. Carroll, “Resource Description Framework (RDF): Concepts and Abstract Syntax,” February 2004, latest version available at http://www.w3.org/TR/rdf-concepts/. [20] D. Kapur and P. Narendran, “NP-Completeness of the Set Unification and Matching Problems,” in Proceedings of the 8th International Conference on Automated Deduction. London, UK: Springer-Verlag, 1986, pp. 489–495. [21] A. Paschke, D. Hirtle, A. Ginsberg, P.-L. Patranjan, and F. McCabe, “RIF Use Cases and Requirements,” March 2009, W3C Working Draft. http://www.w3.org/2005/rules/wiki/UCR. [22] J. Lloyd, Foundations of Logic Programming (Second Edition). Springer-Verlag, 1987. [23] H. Enderton, A Mathematical Introduction to Logic. Academic Press, 2001. [24] P. Biron and A. Malhotra, “XML Schema Part 2: Datatypes Second Edition,” W3C, W2C Recommendation, October 2004, http://www.w3. org/TR/xmlschema-2/. [25] G. Yang, M. Kifer, and C. Zhao, “FLORA-2: A Rule-Based Knowledge Representation and Inference Infrastructure for the Semantic Web,” in International Conference on Ontologies, Databases and Applications of Semantics (ODBASE-2003), ser. Lecture Notes in Computer Science, vol. 2888. Springer, November 2003, pp. 671–688. [26] M. Kifer, “FLORA-2: An Object-Oriented Knowledge Base Language,” The FLORA-2 Web Site, http://flora.sourceforge.net. [27] H. Thompson, “Normal Form Conventions for XML Representations of Structured Data,” October 2001, http://www.ltg.ed.ac.uk/~ht/ normalForms.html. [28] A. Riazanov, “VampirePrime Reasoner,” Some software for public use, http://www.freewebs.com/riazanov/software.htm. [29] Ontoprise, GmbH, “Ontobroker,” http://www.ontoprise.com/. [30] A. Van Gelder, K. Ross, and J. Schlipf, “The Well-Founded Semantics for General Logic Programs,” Journal of ACM, vol. 38, no. 3, pp. 620–650, 1991. [Online]. Available: http://citeseer.ist.psu.edu/ gelder91wellfounded.html [31] M. Gelfond and N. Leone, “Logic Programming and Knowledge Representation — the A-Prolog Perspective,” Artificial Intelligence, vol. 138, no. 1–2, pp. 3–38, 2002.
Harold Boley is Adjunct Professor at the Faculty of Computer Science, University of New Brunswick, and Leader of the Semantic Web Laboratory at the Institute for Information Technology, National Research Council Canada. His specification of Web rules through RuleML has found broad uptake, been combined with OWL to SWRL, and become the main input to RIF. His work on Rule Responder has enabled deployed distributed applications for the Social Semantic Web.
Michael Kifer is a Professor with the Computer Science Department, Stony Brook University. His work on knowledge representation, particularly on F-logic and HiLog, is among the most widely cited in Semantic Web research. He was twice recipient of the prestigious ACM-SIGMOD “Test of Time” award for his work on object-oriented database languages. Recently he also received SUNY Chancellor’s Award for Excellence in Scholarship.