Towards Semantic Tuplespace Computing: The Semantic Web Spaces System Lyndon Nixon
[email protected]
Elena Paslaru Bontas Simperl
[email protected]
Olena Antonechko
[email protected]
Robert Tolksdorf
[email protected]
Arbeitsgruppe Netzbasierte Informationssysteme (AG NBI) Freie Universität Berlin Institut für Informatik Takustr. 9, 14195 Berlin, Germany http://www.ag-nbi.de
ABSTRACT In this paper we introduce Semantic Web Spaces, a middleware for coordinating knowledge processes on the Semantic Web.1 Co-ordination is an important aspect of any type of interaction between computer agents, but we find especially so on the Semantic Web in which the communication contains knowledge rather than data and correct inferences can only be made when the right knowledge is available at the right time. Because of this we have identified tuplespace computing as a relevant paradigm for agent communication on the Semantic Web and have prototypically realized a system based on a Linda-inspired coordination model and on core semantic technologies such as RDF, ontologies and reasoning.
Categories and Subject Descriptors C.2.4 [Computer-communication networks]: Distributed systems- tuplespace computing; D.3.2 [Programming languages]: Language classifications- Linda
Keywords Semantic Web, middleware, Linda, tuplespaces
1.
INTRODUCTION
The tuplespace computing paradigm supports co-ordination in a shared virtual dataspace, the tuplespace, using a simple set of synchronization primitives. Through this a decoupling 1 This work is partially supported by the EU Network of Excellence KnowledgeWeb (FP6-507482).
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’07 March 11-15, 2007, Seoul, Korea Copyright 2007 ACM 1-59593-480-4 /07/0003 ...$5.00.
in space and time of the agents which are communicating is made possible. Tuplespace communication has also been applied to the Web, despite the challenges in implementing a global tuplespace. According to the emerging Semantic Web vision, the Web will no longer only be a global network for the persistent publication of data in the form of low level media objects (images, video, audio) and (semi-)structured documents ( HTML, XML, RSS). It is envisaged also as a sink for high volumes of machine readable data expressed using semantic models based on formal description logics. This data may annotate the existing Web content (metadata), may define types of objects in a given domain and the properties linking these objects (ontologies), and make statements of claimed truth based on these ontologies (knowledge bases) [1]. We have considered the need for co-ordination between agents operating on the Semantic Web and proposed a conceptual model for their co-ordination in a previous paper [18]. By applying a tuplespace approach to inter-agent communication and their concurrent interaction with distributed knowledge repositories, we foresee the benefits of a simple yet powerful co-ordination model in which parallel and distributed processes can be uncoupled in space and time. The proposed system has been called “Semantic Web Spaces”, and is envisaged to act as a middleware platform for real world Semantic Web applications, in which it can handle in place of communicating agents the administration of distributed data, co-ordination of multiple processes and mediation between ontological representations. In this paper we return to this proposed Semantic Web Spaces. The core notions of semantic tuplespace computing as a new paradigm of knowledge-based co-ordination and communication are introduced in Section 2. Section 3 describes the architecture and subsequent technical realization of the conceptual model. On the basis of our prototype we can offer first evaluation results in Section 4. We compare our approach to related work in the semantic tuplespace computing field and draw some conclusions and pointers for future work (Sections 5 and 6, respectively).
2.
SEMANTIC WEB SPACES
Since its original conception in the context of parallel programming, Linda has undergone a multitude of different extensions in order to apply it in a wide range of scenarios which benefit from its simple yet powerful co-ordination model [8]. In particular, recent work has drawn up requirements for its application in open distributed systems [10]. Building upon these results the semantic extension of the original concept of tuplespaces can be described according to the following dimensions (cf. [15]): • New types of tuples: the representation of semantic data within tuplespaces requires new types of tuples which are tailored to standard Semantic Web languages (RDF(S) [9] and OWL [13]). RDF triples can be represented in a four field tuple of form (subject, predicate, object, id). As foreseen by the RDF abstract model, the first three fields contain URIs (or, in the case of the object, literals). These URIs identify RDF resources presumingly available on the Web. Fields are typed by RDFS/OWL classes (URIs) and XML-based datatypes (literals). Additionally, we choose to uniquely identify each RDF statement by means of an ID field. In this way statements sharing the same subject, predicate and object can be addressed separately, which is consistent with the Linda model. The allocation of the IDs is coordinated by the tuplespace with the help of the tuplespace ontology (see below). • New co-ordination primitives: the transition from data-centered tuplespaces to the new semantics-aware Semantic Web Spaces requires a revision of the meaning of the Linda coordination model. In Semantic Web Spaces we fundamentally distinguish between a data view and an information view upon the stored RDF tuples. In the data view all tuples are seen as plain data, without semantics, like in traditional Linda systems. In the information view we see the set of RDF tuples in the tuplespace as a RDF graph. This imposes consistency and satisfiability constraints with regard to the RDF semantics and to associated ontologies defined in RDFS or OWL. Hence, the traditional coordination model is revised in order to support this distinction. In the data view we make use of a Linda-compliant variation of the traditional out, in and read operations. They preserve the original semantics while operating on the structure of RDF triples. Handling Semantic Web knowledge in the information view requires however co-ordination primitives, which take into account the truth value of the underlying tuples and the ontologies the tuples might refer to. For this purpose we introduce the operations claim, endorse and retract. For example, an agent can add a particular truth to the space, such as claim < PaulSmith, authorOf, CM111> 2 , meaning that Dr Paul Smith (being represented in RDF by the resource named PaulSmith) is the author of the conference paper with the id CM111. If the resource named PaulSmith was identified in the space as being of type Dog, then the claim would return false where an outr would 2 Ordinarily RDF resources such as these would exist in some namespace, so that they can be disambiguated from other resources with the same name. Here we exclude namespaces for readability.
return true, as while the tuple is correct syntactic RDF, the claim takes additionally the ontology into account, and in our example the property authorOf can only have a Person as subject. If another agent asks endorse < PaulSmith, creator, ?> and the space has no tuples of this form, we could expect the operation to return empty, as is the case with a data view operation like rdr. However, given the ontological information that the authorOf property is a subproperty of creator, then the statement could be returned even though this tuple does not explicitly exist (we call this an inferred tuple). In addition, we have defined multiple tuple operations which output or read a set of RDF triples as a single request-response. The excerpt operation in particular uses a ”context” in the information view to contain a set of copies of the matched tuples in a private partition of the space (the reference is only passed to the agent excerpting the triples). This allows contexts to explicitly contain inferred tuples which are only implicit in the information view of the space and to allow agents to construct RDF models and continue interacting with those models without affecting the rest of the space (e.g. destructive reads). Table 1 gives an overview of the co-ordination model within Semantic Web Spaces. • New matchings: the standard Linda matching approach is extended in order to efficiently manage the newly defined tuple types. This applies for both the data and the information view. The former includes procedures to deal with the types associated with the subject, predicate and object fields of each RDF tuple.3 . The latter additionally takes into account domainspecific types defined in external ontologies. Semantic matching techniques may then make use of this knowledge with the help of reasoning services in order to refine the retrieval capabilities of the tuplespace. For example, subclass and subproperty information in a RDF Schema can be used to determine inferred tuples, as in the example given in the previous paragraph. Orthogonal to these dimensions Semantic Web Spaces contain a tuplespace ontology, which is used as a formal description of the tuplespace, its components and properties. Using ontologies in this context allows a more flexible management of the tuplespace content and of the interaction between tuplespace and information providers and consumers (extendability, automatic inferencing etc.). Figure 1 shows a high level architecture of Semantic Web Spaces. As all Linda-based systems, the central components are the Linda coordination model and the tuplespace as a shared data space for tuples. In Semantic Web Spaces we extend the core architectures with a reasoning component for interpreting ontologies according to their formal semantics (and drawing inferences, checking satisfiability etc). Accordingly, the tuplespace is extended to support building a semantic view upon the tuples (i.e. construction of a RDF graph model from RDF data stored in the tuplespace) and association of RDF statements with the ontologies they refer 3 These types are pre-defined in the RDF schema (RDFS) [2]
Data view outr: (Statement) → Insert a new RDF statement boolean to the data view outgr: (Model) → Insert a set of RDF stateboolean ments extracted from a Jena model to the data view rdr: (Triple or Node) Read an RDF statement from → Statement the data view of the space using a three-fielded template of the form (s,p,o) or a Node containing a tuple id rdgr: (Triple) → Model As rdr but returns all matches as a Jena model inr: (Triple or Node) → Destructively read an RDF Statement statement from the data view of the space likewise with a template or tuple id ingr: (Triple) → Model As inr but destructively reads all matched RDF statements from the data view Information view claim: (Statement) → Assert a RDF statement in boolean the information view of the space if consistent with the RDF Schema endorse: (Triple) → Read a RDF statement from Subspace the information view of the space excerpt: (Triple) → Read all matching RDF stateURI ments by copying them into a context and returning an URI identifying it retract: (Triple) → Deny the truth value of an Subspace RDF statement in the information view of the space (retained in the data view) Table 1: Spaces
Co-ordination model in Semantic Web
to. The component handling the coordination of processes is extended with modules to fulfil different administrative services that we consider as requisites in a Semantic Web middleware [17]. We exemplify this issue by considering security and trust components as extensions of the classical architecture. A set of metadata describes the tuplespace itself, according to the tuplespace ontology. Finally, as the system is foreseen as a middleware platform, it should be independent of the underlying implementations of the different computer systems that the system must interact with. This necessitates interfaces to isolate the system kernel from the heterogeneity of both the clients which communicate with the system and the backend storage solutions which realize the physical storage of the information represented in the logical memory of the tuplespace. For more details regarding the conceptualisation of Semantic Web Spaces the reader can refer to [16] and [18].
3.
TECHNICAL REALIZATION
A concept for the technical realization of Semantic Web Spaces was drawn up subsequent to the specification of the conceptual model (Figure 2). Vertically it can be divided into three major components, which are concerned with i) the publication of Semantic Web information, ii) security and trust aspects of all activities and iii) retrieval with the help of tuple matching heuristics,
Semantic Web Clients HTTP
SOAP
SMTP
Client Interface Ontological reasoner, matchmaker
Co-ordination model
RDF model of data, ontologies
Tuple space
Admin. Services (e.g. security & trust) Tuple space metadata
Repository Interface Sesame
Instance Store
In-memory (e.g. Protégé)
Knowledge Data Stores
Figure 1: High-level Architecture Publishing
I/O
DATA RDF Spaces Linda LighTS
Retrieval Security & Trust
Matching
Trust
Ontology matching
Metamodel
INFORMATION RDF(S) OWL Rules Spaces
Views
RDFS I/O
Contexts Consistency
Triple I/O
Validation
Triple matching
Authentication
Linda matching
Subspaces Linda I/O
STORAGE Storage I/O
DBMS query
DBMS mapping
Database
Database
Database
Figure 2: Concept for realization
respectively. Horizontally the concept is also divided into three layers: the first two layers correspond to the information and data view we mentioned in the previous section, while the third layer handles the persistent storage of the tuplespace information. Accordingly, from bottom to top, the tuplespace system manages raw data, syntactic virtual data (Linda tuples) and semantic virtual data (RDF tuples). In other words, as we view the concept from bottom to top the data representation becomes more abstract (high level). Semantic Web Spaces have been partially implemented 4 . We were aware that there are a number of Linda-based tuplespace implementations available. Some are quite heavyweight with support for e.g. transactions, security and publishsubscribe built in, such as JavaSpaces or TSpaces. Some support the core Linda model (in, out and rd operations) and little more. We prefer a lightweight approach to allow for more flexibility in our selection of solutions for different aspects of the implementation. We also aim at a minimal system footprint and simplicity/flexibility of code in order to try to maximize efficiency and the possibility of later code optimization. We have taken the LighTS framework [14] and extended it for handling tuples which contain semantic information. More specifically, with reference to the concept for the technical realization, we have realized the Triple and RDFS 4 The prototype is available for download http://sourceforge.net/projects/semwebspaces
at
Sequential outR's
Time for 1 outR
I/O on top of the LighTS I/O, ontology and triple matching on top of Linda matching and views (subspaces, contexts and meta-model). Security and trust, as well as backend persistent storage are subject of future work. A Java class diagram of the implementation is shown in Figure 3.
0,04500 0,04000 0,03500 0,03000 0,02500 0,02000 0,01500 0,01000 0,00500 0,00000 -
100.000
200.000
300.000
400.000
Number of statements
Figure 4: Evaluation of outr Figure 3: Implementation Class Diagram Time for 1 rdR or inR operation
Time for 1 operation
The core class is the RDFTupleSpace. It contains the client access methods, which parallel the coordination lan120,00 guage operations shown in Table 1. 100,00 The in-memory models of the data view of the space, the Information View of the space and the meta-model of the 80,00 space are encapsulated in the classes DataView, InfoView 60,00 rdR and TSOntology respectively. These contain methods called inR from the RDFTupleSpace to add or delete tuples and hence 40,00 to maintain state. To reflect their conceptual differences, 20,00 the DataView is a Jena RDF model in which all triples are reified (and hence referencable and duplicable) and there 0,00 is no inference. The InfoView is a Jena RDF model with 100.000 200.000 300.000 400.000 inference, so that it reflects the data view extended by triples inferrable from its content according to the available RDF Number of statements in tuple spa ce Schema(s). The TSOntology contains a Jena RDF model with the metamodel of the space. To save on accesses, the models are not updated after every operation on the space Figure 5: Evaluation of rdr and inr but temporary models log changes and updates are made after a certain amount of operations. This discrepancy is acceptable as the blocking operation of Linda automatically tuples in the space but exponentially for larger spaces. Outwaits until the sought-for tuple is updated into the model. RDFTupleSpaceQuery encapsulates the formulation of SPARQLput of tuples into the space takes place within this linear limit at approximately 1/3000 of a second. Retrieval is queries which are applied to the appropriate Jena model. slower: a rdr on a tuplespace with 150000 tuples took on FunctionsForJena completes the implementation with addiaverage 2.85 sec, an inr took twice as long. It can be seen tional convenience methods for working with Jena. that by a size of 200000 tuples retrieval is already taking exponential time and by 300000 tuples operations already 4. EVALUATION last 80-100 seconds. This linear limit seems to be a product of the Jena model which is being stored in memory. If The described implementation has been tested to acquire it could be increased by an alternative tuplespace storage some initial efficiency measurements. These tests are useapproach then one could expect reasonable results by data ful for determining the expected performance of a Semantic view interaction, even with extra-large spaces. Web Space as well as for identifying areas for optimisation Secondly, we find that the information view operations in the conceptual model and the implementation. claim, retract and endorse (Figures 6 and 7) are linear and The test system was a Microsoft Windows XP PC with while claim and retract operations which change the space an Intel(R) Pentium(R) 4 with a CPU speed of 2.66 GHz are much slower than their data view equivalents, the enand 512 MB RAM. The Java version was JDK 5.0 and the dorse operation is much quicker than rdr. For example an VM used an initial heap size of 256 MB and a maximum outr on a space of 100000 tuples took 1/3000 of a second heap size of 512 MB. The RDF used was taken from the compared to 0.25 seconds for a claim. A retract on a space MusicBrainz RDF dump 5 . of 10000 tuples took 2.31 seconds compared to 0.31 seconds Firstly, we find that the data view operations (outr, rdr, for inr while an endorse on the same space took 0.025 secinr) (Figures 4 and 5) function roughly linearly up to 200000 onds compared to 0.15 seconds for rdr. We have not tested 5 http://musicbrainz.org/MM/ on very large spaces (200000 and more tuples) but we would
T ime for 1 inR and rdR with subspaces
Sequential claims Time for 1 inR and rdR
1,00 claims/sec
0,80 0,60 0,40 0,20 0,00 0
10000
20000
30000
8,00 6,00 inR
4,00
rdR"
2,00 0,00 -
50
100
150
200
250
300
Number of subspaces
40000
Number of statements
Figure 8: Evaluation of subspaces Figure 6: Evaluation of claim
Time for 1 endorse and retract operation 8,00
Time for 1 operation
7,00 6,00 5,00 4,00
endorse
3,00
retract
5.
2,00 1,00 0,00 -
of scalability requirements. We are disadvantaged by the problems of storing and reasoning over large semantic models such as those supported by Jena and future work would include both persistent storage of data as well as caching techniques to more efficiently access commonly queried data sets. In particular, the implementation approach to subspaces needs to be revised. We expect that refinements in these areas could support greatly the further scalability of Semantic Web Spaces, including its realization as a distributed system.
10.000
20.000
30.000
40.000
Number of statements in tuplespace
Figure 7: Evaluation of endorse and retract
expect to encounter the same linear limit as seen with the data view operations. Hence we discover that despite performance issues in building large semantic tuplespaces using Jena, once such spaces exist, non-destructive retrieval from them should be very efficient (less than 1 sec for up to 600000 tuples). Destructive retrieval however, as it necessitates rebuilding the Jena inference model, is slow (approximately 2 and a half minutes for the same 600000 tuple-sized space). Finally, we concepualised subspaces as a means to virtually partition a space into smaller spaces which could be more efficiently interacted with. Given the scalability issues we identified with using in-memory Jena models, this appears an useful approach. However the tests (see Figure 8) showed that the introduction of subspaces slowed the operations (taking approximately 5 times longer) and as the space was further partitioned, there was no improvement. This was identified as a result of our approach to model the structure of the tuplespace in the tuplespace ontology, necessitating a semantic query on the ontology to identify the tuples belonging to a specific subspace and building a model of that subspace prior to being able to query over it. As a result, as more subspaces were added to the tuplespace, the more complex the tuplespace ontology became. These tests indicate the challenges that still exist in implementing a Semantic Web Space, particularly in the light
RELATED WORK
This paper has described the design and implementation of Semantic Web Spaces, a middleware for coordinating knowledge processes on the Semantic Web. We consider this work to be the first comprehensive and formal specification of a Semantic Web-enabled coordination model. While the idea of combining Linda and Semantic Web information has been previously proposed [7, 6], subsequent proposals for a semantics-enabled coordination model [3, 4] have not addressed issues covered in this paper such as particular representation of RDF syntax, different levels of matching, or tuplespace partitioning. It is also unclear to what extent they continue to respect the basic principles of Linda, while our approach is clearly “backwards compatible”. The TSC project applies coordination principles to realize a communication middleware for Semantic Web Services [5]. However, the approach is built upon an existing coordination system which led to many design decisions being simply carried over rather than re-assessed, as we have done, in a Semantic Web context. For example, the API is unnecessarily large and access is to Java objects in the space encapsulating RDF graphs, preventing any lower granularity access at the triple level such as in Semantic Web Spaces. cSpaces foresees the usage of the publish-subscribe communication paradigm in order to solve the data and process integration issues on the Web [12]. While Semantic Web Spaces has approached semantic tuplespace computing from the perspective of attempting a lightweight implementation, cSpaces is deliberately heavyweight as its focus is much broader and challenging. There is not yet an implementation of cSpaces, while our experiences have shown already which problems can be expected in a much more lightweight implementation approach. One prototype system, sTuples, has been developed, in which the JavaSpaces platform was extended to support
OWL data in tuple fields [11]. However, this approach has not further considered the implications of coordinating Semantic Web information, as we have done. Rather, OWL graphs are exchanged within tuples, and extracted and processed in other systems while in Semantic Web Spaces we seek to integrate a Semantic Web framework within the system.
6.
[14]
CONCLUSIONS AND FUTURE WORK
In this paper we presented the reference architecture and technical realization of Semantic Web Spaces, a middleware for coordinating knowledge processes on the Semantic Web based on Linda and tuplespace systems. An evaluation of our prototype shows that challenges remain in implementing the system in a highly scalable manner. This is also a general problem in Semantic Web storage and reasoning, though our plans are also to examine how approaches to scalability in Linda systems may also be applicable. We have demonstrated that an approach based on Linda and tuplespaces can be realized to co-ordinate between agents communicating on the Semantic Web (i.e. sharing semantic data). From our experiences to date we plan to further refine our approach and use the prototype to model ”real world” semantic communication scenarios.
7.
[13]
REFERENCES
[1] T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34–43, 5 2001. [2] D. Brickley and R. V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema. Available at http://www.w3.org/TR/rdf-schema/, 2004. [3] C. Bussler. A minimal triple space computing architecture. In Proc. of the WIW 2005 Workshop on WSMO Implementations, 2005. [4] C. Bussler, E. Kilgarriff, R. Krummenacher, F. Martin-Recuerda, I. Toma, and B. Sapkota. D21. v0.1 WSMX Triple-Space Computing, June 2005. [5] C. Bussler, E. Kilgarriff, R. Krummenacher, F. Martin-Recuerda, I. Toma, and B. Sapkota. D21. v0.1 WSMX Triple-Space Computing, June 2005. [6] D. Fensel. Triple-based Computing - WSMO Working Draft. http://www.wsmo.org/2004/tp-computing/, June 2004. [7] D. Fensel. Triple-Space Computing: Semantic Web Services Based on Persistent Publication of Information. In INTELLCOMM, pages 43–53, 2004. [8] D. Gelernter and N. Carriero. Coordination languages and their significance. Commun. ACM, 35(2):97–107, 1992. [9] P. Hayes and B. McBride. RDF Semantics. Available at http://www.w3.org/TR/rdf-mt/, 2004. [10] B. Johanson and A. Fox. Extending tuplespaces for coordination in interactive workspaces. Journal of Systems and Software, 69(3):243–266, 2004. [11] D. Khushraj, O. Lassila, and T. W. Finin. sTuples: Semantic Tuple Spaces. In MobiQuitous, pages 268–277, 2004. [12] F. Martin-Recuerda. Towards CSpaces: a new perspective for the Semantic Web. In Proceedings of
[15]
[16]
[17]
[18]
the 1st International Working Conference on Applications of the Semantic Web IASW2005, 2005. P. F. Patel-Schneider, P. Hayes, and I. Horrocks. OWL Web Ontology Language Semantics and Abstract Syntax. Available at http://www.w3.org/TR/owl-absyn/, 2004. G. P. Picco, D. Balzarotti, and P. Costa. LighTS: A Lightweight, Customizable Tuple Space Supporting Context-Aware Applications. In Proceedings of the 20th ACM Symposium on Applied Computing (SAC05), Santa Fe (New Mexico, USA), Mar. 2005. ACM Press. D. Rossi, G. Cabri, and E. Denti. Tuple-based technologies for coordination. In A. Omicini, F. Zambonelli, M. Klusch, and R. Tolksdorf, editors, Coordination of Internet Agents: Models, Technologies, and Applications, chapter 4, pages 83–109. Springer Verlag, 2001. ISBN 3540416137. R. Tolksdorf, L. Nixon, and E. Paslaru Bontas. A Conceptual Model for Semantic Web Spaces. Technical Report TR-B-05-14, Free University of Berlin, September 2005. R. Tolksdorf, L. Nixon, E. Paslaru Bontas, D. M. Nguyen, and F. Liebsch. Enabling real world Semantic Web applications through a coordination middleware. In Proceedings of the ESWC05, 2005. R. Tolksdorf, E. Paslaru-Bontas, and L. Nixon. A co-ordination model for the Semantic Web . In Proceedings of the 21st ACM Symposium on Applied Computing, Track “Coordination Models, Languages and Applications”, 2006.