A Reasoning Algorithm for pD

8 downloads 0 Views 252KB Size Report
Huiying Li1, Yanbing Wang1, Yuzhong Qu1, and Jeff Z. Pan2. 1 Department of Computer ... Herman Horst [6] proposes a semantic ex- tension of RDFS, called ...
A Reasoning Algorithm for pD* Huiying Li1 , Yanbing Wang1 , Yuzhong Qu1 , and Jeff Z. Pan2 1

Department of Computer Science and Engineering, Southeast University, Nanjing 210096, P.R. China {huiyingli, ybwang, yzqu}@seu.edu.cn 2 Department of Computing Science, The University of Aberdeen, UK [email protected] Abstract. pD* semantics extends the ‘if-semantics’ of RDFS to a subset of the OWL vocabulary. It leads to simple entailment rules and relatively low computational complexity for reasoning. In this paper, we propose a forward-chaining reasoning algorithm to support RDFS entailments under the pD* semantics. This algorithm extends the Sesame algorithm to cope with the pD* entailments. In particular, an optimization to the dependent table between entailment rules is presented to eliminate much redundant inferring steps. Finally, some test results are given to illustrate the correctness and performance of this algorithm.

1

Introduction

RDF (together with RDF Schema, or RDFS) and OWL are important Semantic Web (SW) standards from W3C. Herman Horst [6] proposes a semantic extension of RDFS, called pD* semantics, that supports datatypes, some OWL constructors and axioms. Interestingly, the pD* semantics is in line with the ‘if-semantics’ of RDFS and weaker than the ‘iff-semantics’ of OWL. With the ‘if-semantics’, pD* entailment is NP-complete, which is the same as RDFS entailment. In this paper, we propose a forward-chaining reasoning algorithm for pD* entailment. This algorithm extends the Sesame algorithm and performs an optimization to the dependent table between entailment rules to eliminate much redundant inferring steps. We use the W3C recommended benchmark to evaluate our algorithm - the test results show that the data loading time of this algorithm is better than usual exhaustive forward-chaining algorithms. In addition, we also provide an efficient approximate algorithm for users who want some correct results rapidly but do not require the quick answers to be complete. The rest of the paper is organized as follows. In section 2, we give a short introduction to the related work. Section 3 gives an overview of the preliminaries about pD* semantics. In section 4 we introduce our forward-chaining reasoning algorithm for pD*. Section 5 discusses our test results. Finally we provide our conclusions in section 6.

2

Related Work

The advent of RDFS represented an early attempt at a SW ontology language based on RDF. As the constructors and axioms provided by RDFS are primiR. Mizoguchi, Z. Shi, and F. Giunchiglia (Eds.): ASWC 2006, LNCS 4185, pp. 293–299, 2006. c Springer-Verlag Berlin Heidelberg 2006 

294

H. Li et al.

tive, W3C recommends the SW ontology language OWL. There are two kinds of semantics related to RDFS and OWL, namely RDF MT and RDFS(FA) [7]. Accordingly, reasoners support the reasoning of RDFS and OWL can be divided into two categories. Reasoners in the first category, such as Sesame [1] and Jena, are based on the RDFS entailment rules and the extended entailment rules defined in RDF MT. Reasoners in the second category, such as FaCT++ [10], RACER [4] and Pellet [9], support the bottom two layers of the RDFS(FA) semantics [7]. They usually implement tableau-based decision procedures for Description Logics (DLs). Our work discussed in this paper belongs to the first category. We extend the Sesame algorithm to cope with not only RDFS entailment rules but also pD* entailment rules and optimize the dependencies between rules for eliminating much redundant inferring steps.

3

pD* Semantics

It is well known that extending RDF with the OWL constructors and axioms (i.e. OWL Full) would lead to undecidability. It has also been shown in [8] that extending RDF MT to First Order Logic (FOL) results in a collapse of the model theory. So far there are two solutions to providing decidable extensions for RDF: we can adopt either the FA semantics [7], which has been shown to be compatible with OWL DL, or the pD* semantics [6], which extends RDF MT to cover some OWL constructors and axioms. Interestingly, the pD* semantics is in line with the ‘if-semantics’ of RDFS and weaker than the ‘iff-semantics’ that is used in the RDF-compatible semantic for OWL DL and OWL Full. One of the motivations of having the iff-semantics in the RDF-compatible semantic for OWL is to solve the ‘too few entailment’ problem [7]. Note that the iff-semantics is not relevant to the direct semantics of OWL DL. Among the 15 OWL URIs, the pD* interprets owl:FunctionalProperty, owl:InverseFunctionalProperty, owl:SymmetricProperty and owl:TransitiveProperty as the if conditions of the standard mathematical definitions. The owl:inverseOf is interpreted as that if two properties are owl:inverseOfrelated, then their extensions are each other’s inverse as binary relations. The pD* semantics requires that two classes are equivalent if and only if they are both subclasses of each other. owl:equivelantProperty is treated in a similar way to owl:equivalentClass. The pD* semantics interprets owl:sameAs as an equivalence relation.In particular, the pD* semantics includes the iff condition for owl:hasValue. But for owl:someValueFrom and owl:allValueFrom, the pD* semantics still includes half of OWL’s iff conditions. If two classes are owl:disjointWith-related the pD* semantics requires their extensions are disjoint. The pD* semantics requires that the extensions of owl:sameAs and owl:differentForm are disjoint. Given the pD* semantics discussed above, the corresponding pD* entailment rules are also given in [6]. It consists of 23 rules to illustrate that what conclusion can be deduced from some given premises. These rules are proved to be sound and complete with respect to the pD* semantics.

A Reasoning Algorithm for pD*

295

With respect to the subset of the OWL vocabulary considered, the pD* semantics is intended to represent a reasonable interpretation that is useful for drawing conclusions about instances in the presence of an ontology, and that leads to simple entailment rules and a relatively low computational complexity.

4 4.1

A Forward-Chaining Reasoning Algorithm Sesame Algorithm

Sesame is a Java framework for storage and querying of RDF and RDFS information. It uses a forward-chaining algorithm to compute and store the closure during any transaction that adds data to the repository. The algorithm runs iteratively over the RDFS entailment rules, but makes use of the dependencies between entailment rules to eliminate redundant inferring steps. Where, a rule(r1) a1 → b1 triggers another rule(r2) a2 → b2 if there is some conclusion s ∈ b1 that matches a premise p ∈ a2. Such a trigger is referred to a dependency between two rules. The dependency relations between the entailment rules used in sesame algorithm are shown in the left of Table1. The Sesame algorithm is guaranteed to terminate: each new iteration is applied only to statements newly derived in the previous iteration. Since the total set of statements in the closure is finite, the algorithm terminates when no new statements can be derived. 4.2

Optimized Dependencies Between RDFS Entailment Rules

Based on the Sesame algorithm, we provide an optimization to the dependencies between RDFS entailment rules for a given premise. Using this optimization, much redundant inferring steps can be eliminated when computing the pD* closure. Usually, a knowledge base is divided into two levels: schema level and instance level. Schema level contains the statements about concepts and the relationships between concepts. Instance level contains the statements about individuals. Besides these two levels, we define a metaschema level. This metaschema level contains the declarations about built-in vocabulary. We define that the metaschema level contains the statements which satisfy such pattern: the subject is the built-in vocabulary, the predict is rdf:type or rdfs:subClassOf or rdfs:subPropertyOf or rdfs:domain or rdfs:range or owl:sameAs or owl:equi- valentClass or owl:equivalentProperty. That is to say, the metaschema level contains the statements such that what is the domain of rdf:type or what is the range of rdf:type. Obviously, the axiomatic triples and the statements deduced from them are included in the metaschema level. Usually, there are no metaschema level statements in an usual RDF or OWL file. That is to say, user will hardly make the statements like what is the domain of rdf:type. They always accept the axiomatic triples as default. Based on this hypothesis, the optimized result to the dependent table is shown in the right of Table1. Take the dependency between rule1 and rule3 for example, the conclusion of rule1 is:

. The premises of rule3 are:

, . We may find that the conclusion of rule1 matches the second premise of rule3, because there has the triple: in RDFS axiomatic triples and we suppose that user accepts it as default. When rule1 triggers rule3, only statement: which has been already deduced from axiomatic triples can be inferred. So the dependency between rule1 and rule3 is meaningless, it can be removed. Compared to the dependent table used in sesame algorithm we can find that 40% of dependent relationships are removed using this optimization. 4.3

The Reasoning Algorithm

Applying the optimized method to all the entailment rules (RDFS entailment rules and pD* entailment rules), a complete dependent table between all entailment rules can be illustrated. We do not list the complete dependent table for the reason of limited space. Using the complete dependent tables, we propose a forward-chaining reasoning algorithm. It consists of a simple loop to obtain the pD* closure of an RDF graph G. It is an iterative procedure to apply the entailment rules to G and terminate until no new statements can be derived. The detailed algorithm is described as follows: 1. 2. 3. 4.

Initialize all rules are recorded as triggered. Read in RDF graph G and all the axiomatic triples. Begin iteration. For each rule, determine whether it is triggered in last iteration. If its premises are matched by newly derived triples in last iteration, apply this rule to graph G and record the rules triggered by it. 5. Iteration terminate until no new triple added to G.

A Reasoning Algorithm for pD*

297

When a certain rule is applied to G, it means that firstly we search the triple newly derived in last iteration for the triple matches one of the premises of the rule, if succeed, try to find the other premises in G, if all premises can be matched, the conclusion of this rule is deduced and added to G. For example, when rule rdfs2 is triggered, we will firstly search all the triples derived in last iteration to find a triple that matches at least one of the premises. If succeed, then search G for the triples matched the other premise. A pair of triples that matched with these two premises will deduce the conclusion triple, all such pairs are found out and the corresponding conclusions are derived. Finally, the rules triggered by rdfs2 are recorded for next iteration. Using this algorithm, the pD* closure(Gp) of G can be computed in polynomial time. And whether G pD* entails RDF graph H can be converted into checking if Gp contains an instance of H as a subset or contains a P-clash. A P-clash is either a combination of two triples of the form , , or a combination of three triples of the form , , . Same as Sesame algorithm, this algorithm is also guaranteed to terminate.

5

Test Results

The OWL Web Ontology Language Test Cases [2] is a W3C Recommendation. Because there does not have the real pD* test cases, we test our algorithm on the positive entailment test cases of OWL. We select all the test cases responding to the vocabulary supported by pD* from [2]. The results are shown in Table2. For each OWL test case, the symbol ‘—’ indicates that it is not a positive entailment test case, otherwise, there are two denotations. The symbol “S” (or “U”) at the left position indicates that the underlying semantic condition of this test case is supported totally(or unsupported) by pD*, while the symbol “P” (“U”or “F”) at the right position indicates that our algorithm passes (unsupports or fails in) the corresponding test case. Totally, there are 37 positive entailment tests about the OWL vocabulary subset included by pD*. From the test results we observe that the underlying semantic conditions of 18 test cases are supported totally by pD*. Among them, our algorithm passed 16 tests. Take the first test case of owl:allValuesFrom for instance, our algorithm will apply firstly the entailment rule rdfs9 to infer from and that is tenable. Then after applying entailment rule rdfp16, we infer from , < :a onProperty p>, < :a allValuesFrom c>, that is tenable. Since and are tenable in the premises, the conclusions of this test case are all tenable. So this test case is passed. The other passed test cases are similar to this example. Among the 18 pD* test cases, two of them which includes datatype are failed because our algorithm does not support the reasoning of datatype by now. The test results listed above illuminate that with respect to the pD* most of the test cases can be passed by our algorithm.

298

H. Li et al. Table 2. Test results Positive Entailment Test 001 FunctionalProperty S/P InverseFunctionalProperty S/P Restriction — SymmerticProperty S/P TransitiveProperty S/P allValuesFrom S/P differentFrom U/U disjointWith S/P equivalentClass S/P equivalentProperty S/P inverseOf S/P sameAs S/P someValuesFrom U/U

002 S/F S/F — U/U U/U — U/U S/P S/P S/P — — U/U

003 U/U U/U — U/U — — — — S/P S/P — — U/U

004 U/U U/U — — — — — — U/U U/U — — —

005 006 U/U — — — — U/U — — — — — — — — — — — U/U U/U S/P — — — — — —

007 — — — — — — — — U/U — — — —

For illustrating the performance of our algorithm, we use five different data sets to test the loading time of three different algorithms. One of them is the exhausitive forward-chaining algorithm which does not use the dependencies between rules, one is our algorithm discussed above, the other is a more simple algorithm in which some entailment rules are taken off for promoting the performance. The reasoning results of this simple algorithm are guaranteed to be sound, but may be incomplete. The test results show that the data loading time of our algorithm using optimized dependencies between rules is better than exhausitive forward-chaining algorithm. Among these three algorithms, the performance of simple algorithm is the best. From the test results listed above, we find that with respect to the pD* most of the test cases can be passed by our algorithm and its data loading time is better than exhausitive forward-chaining algorithm. In addition for the users who want to get some usual results rapidly but does not like to wait for complete reasoning results, the simple algorithm is more suitable.

6

Conclusion and Future Work

In this paper, we have presented a forward-chaining reasoning algorithm which supports the reasoning of pD*. Based on the premise that metaschema level statements are usually absent in users’ RDF or OWL files, an optimization to the dependencies between entailment rules is applied for elevating the algorithm’s performance. The test results shows that its data loading time is better than exhaustive forward-chaining algorithm. In addition, we also provide an efficient approximate algorithm for users who want some correct results rapidly but do not require the quick answers to be complete. The work reported in this paper can be seen as a first step towards a complete system for storing and querying Semantic Web data with pD* semantics. There are a lot of works to do towards this direction, such as to deal with the conse-

A Reasoning Algorithm for pD*

299

quences of delete operations, to improve the performance for scalability. How to solve these problems will be discussed in our future work.

Acknowledgments The work is supported partially by the 973 Program of China under Grant 2003CB317004, the JSNSF under Grant BK2003001 and the European project Knowledge Web FP6 Network of Excellence EU project Knowledge Web (IST2004-507842). We would like to thank our team members for their work on related experiments.

References 1. Broekstra, J., Kampman, A., Harmelen, F.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In Proc.of the first International Semantic Web Conference (2002), pp. 54-68. 2. Carroll, J.J., Roo, J.D. (Eds.): OWL Web Ontology Language Test Cases. W3C Recommendation 10 February 2004. http://www.w3.org/TR/2004/REC-owl-test20040210/. 3. Guo, Y., Pan, Zh., Heflin, J.: An Evaluation of Knowledge Base Systems for Large OWL Datasets. In Proc. of the 3rd International Semantic Web Conference (2004), pp. 274-288. 4. Haarslev, V., Moller, R.: RACER system description. In Proc. of the Int. Joint Conference on Automated Reasoning (IJCAR 2001), volume 2083 of Lecture Notes in Artificial Intelligence, pp. 701-705. 5. Hayes, P. (Ed.): RDF Semantics. W3C Recommendation 10 February 2004. Latest version is available at http://www.w3.org/TR/rdf-mt/. 6. Horst, H.J.: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. Journal of Web Semantics 3 (2005), pp. 79-115. 7. Pan, J.Z.,Horrocks, I.: RDFS(FA) and RDF MT: Two Semantics for RDFS. In Proc. of the 2nd International Semantic Web Conference (ISWC2003), pp. 30-46. 8. Patel-Schneider P. F.: Building the Semantic Web Tower from RDF Straw. In Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI 2005). 9. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur A., Katz, Y.: Pellet: A Practical OWLDL Reasoner. Submitted for publication to Journal of Web Semantics. 10. Tsarkov, D., Horrocks, I.: Efficient reasoning with range and domain constraints. In Proc. of the Description Logic Workshop (2004), pp. 41C50.