a software architecture to provide persistence and ...

26 downloads 115 Views 1MB Size Report
environments (app). .... concurrency control in the database and the support for desktop and mobile .... Stanford (2011) “Protégé OWL-API Website” Website.
IADIS International Conference WWW/Internet 2011

A SOFTWARE ARCHITECTURE TO PROVIDE PERSISTENCE AND RETRIEVE OF CONTEXT DATA BASED ON ONTOLOGICAL MODELS Vinícius Maran1, Deise de Brum Saccol1, Iara Augustin1 and Alencar Machado2 1

Technology Center – Federal University of Santa Maria Avenida Roraima, 1000 – Santa Maria – RS – Brazil 2 Informatics Institute – Federal University of Rio Grande do Sul Caixa Postal 15064 – Porto Alegre – RS – Brazil

ABSTRACT Ontologies has as main objective to represent domains, providing semantic data and capacity to manage knowledge in computer systems. Althought there are standards for representing ontologies, such as OWL language, the full use of ontologies in computational systems is still restricted because the limitations found in persistence tools that are actually available, such as loss of performance in the conversion process of ontologies to relational databases, limitations related to the horizontal scalability, limiting the use of the same ontology between nodes in a geographically distributed network or the limitations related to querying and reasoning capabilities of the databases used in this tools, these are some of important features required by ubiquitous systems. This article describes a software architecture, that based on CouchDB database, aims to provide a distributed solution to store semantic data based on ontological models using standard query languages, such as SWRL and SQWRL, therefore allowing the use of semantic data in a distributed network, be it composed of fixed or mobile nodes. KEYWORDS Ontology, database, context-aware computing, semantic data, ubiquitous computing, knowledge management.

1. INTRODUCTION The third wave of the computing evolution, also known as ubiquitous computing (Weiser, M., 1991), proposes the definition of a space where users and devices of different types are fully integrated, and where computing is transparent to the user, where devices and systems aim to assist the user in daily tasks even if these users can not notice the computation assistance involved in these processes. Thus, the main feature of ubiquitous systems is that they perform tasks focused on end-user and in their daily activities according to the needs of these users and the context where they are inserted (Kukhun, DAA, Sedes, F., 2007). The context data captured by ubiquitous systems can be represented computationally in many ways. One of the most used is ontology, which allows the representation of a set of concepts and terms that represent a domain knowledge, inserting semantic data in computational systems (Swartout, W. and Tate, A., 1999). In order for ubiquitous systems be able to perform adaptation of executing code and content based on context information captured in several moments and situations (context-aware adaptation), there must be an architecture for persisting data that allows systems to access and query context data anywhere and anytime. To make this happen we need a tool that allows the use and modification of ontologies in a simplified process in a distributed system (Machado, A. et al, 2010). This paper presents a software architecture to provide these features, a new proposal to store ontologies, providing access of semantic data in a distributed database, maintaining full compability with OWL language and contemplating desirable characteristics in the use of context data in ubiquitous systems. Further, the article presents an introduction to the problem of content access based in context data (section 2), then we present a brief review of related work (section 3), the proposed architecture for storage of

101

ISBN: 978-989-8533-01-2 © 2011 IADIS

ontologies (section 4), the description of a use case explaining the use of the proposed architecture (section 5). Section 6 describes the partial conclusions of this work.

2. CONTEXT-BASED ACCESS IN UBIQUITOUS ENVIRONMENTS The term context have many definitions, because it is a broad term that encompasses many areas. In our project, we used the definition made by Dey & Abowd (2000), which defines context as a variety of information that can be used in order to characterize the situation of a entities group. Some context information can be easily identified with use of sensors such as temperature, size of a device display or the current time, but other context information still complex as the recognition and modeling, as the emotional status of a person when performs a particular task (Dey, AK, Abowd, G.D., 2000). Context data can be represented in various ways. Popien & Strang (2005) performed a comparison between the forms of context representation and the requirementes that the models meets. The research is based on six main factors, namely: distributed composition (dc), partial validation (pv), wealth and quality of the information (qua), incomplete and ambiguous (inc), level of formality (for) and applicability to existing environments (app). Table 1 presents the results of this comparison Table 1. Comparison of Context Representation forms (Strang, T., Popien, C., 2005) Requested Approaches Key Value Marking Scheme Object-Oriented Based on Logic Chart Based on Ontologies

dc * * * *

pv * * *

qua * * *

inc * *

for * * * * *

app * * * * *

Thus, it was found that the modeling based on ontologies meets key requirements for a complete representation of context information. From this premise, we need a system that allows the persistence and query context data for ubiquitous environments. In next section, we performed an analysis of the available tools for persistence of ontologies, highlighting some key requirements for their use with pervasive systems.

3. ANALYSIS OF RELATED WORK Batzios & Mitko (2009) performed an comparison between ontology compatible databases, they are: 3store (NoSQL, 2011), Sesame (NoSQL, 2011), Jena2 (NoSQL, 2011) and DB4Owl (Batzios, A., Mitko, PA, 2009). Analyzing the results, they observed that the relational model is mostly used by this tools - used mainly by working with established standards and because it offers good performance, and databases based on triples (triple-store databases) - used for RDF (W3c, 2011) document storage. The 3store project is based on storage of RDF documents using the relational database model as persistence system. For querying, the 3store uses the RDQL language, compatible with RDF data. The inferences and queries are done via direct conversion of RDQL to SQL language (used in relational databases), which creates many extra data, compromising performance on inferences made in large databases (Batzios, A., Mitko, PA, 2009) (W3c, 2011). The projects Jena2 and Sesame APIs consist of use systems such as relational databases for persistence RDF and OWL-Lite files (NoSQL, 2011), but there are not compatible with OWL-DL standard, for example. To minimize the amount of extra data created by the conversion of ontologies to relational databases, the DB4Owl project proposes the use of an object oriented database (db4o) to store a set of classes that represent entities in an ontology. It uses an own XML-based language for querying data and converts it to the query language used in the db4o database. Because the use of db4o database, the db4owl project not aims the use ontology in a distributed way, since the db4o database runs embedded in java application.

102

IADIS International Conference WWW/Internet 2011

From the results of this comparison, we found that the tools currently available do not provide some important requirements for ubiquitous system that needs context data anyway, and anytime, they are listed below: • Provide vertical and horizontal scalability: The persistence tools that support semantic data (RDF or OWL documents) provide vertical scalability (through the use of relational or object oriented databases), but for the full use in ubiquitous systems is necessary that these tools can use an distributed structure to store and retrieve data, allowing data that is persisted in this architecture can be found in many places by different types of devices (horizontal scalability). • Support of standard query and inference languages to OWL data: The tools reviewed support only RDF compatible languages or establish different languages to query or inference OWL data, but we could use standardized and widely used languages for handling semantic data in OWL format, we emphasize languages like SWRL and SWQRL, since they are established standards by W3C consortium and allow the inference and query data in OWL language, since the relational database model has no similarity with the structure of ontologies. • Store only relevant data: The continuous conversion of formats and query languages makes overhead occurs constantly in current tools, resulting in a considerable loss of performance depending on the size of the database and the number of access made in these databases. Moreover, the space occupied by these data increases significantly with the constantly use of these databases (Batzios, A., Mitko, PA, 2009). • Use only a part of the ontology in memory: The processing of inferences and queries should be done only with relevant data, mainly because the performance gets loss and memory consumption increases considerably with the increase of individuals in the ontology. Therefore, it is important that changes could be processed in a limited portion of data (stored in database) (Batzios, A., Mitko, PA, 2009). A comparative summary is presented in Table 2. Table 2. Comparison of Tools for Ontologies Persistence Query Language OWL Support Easily Scalable Database Model Low Data Overhead

3store RDQL No No Relational No

Sesame SeRQL No Yes Relational No

Jena2 SPARQL Yes No Relational No

Db4owl XML Yes No Object Oriented Yes

SemantiCouch SWRL, SQWRL Yes Yes JSON documents Yes

From the results found when comparing these database solutions, we proposed an architecture for the persistence and access of ontologies in a distributed structure focused in ubiquitous systems as described below.

4. THE SEMANTICOUCH ARCHITECTURE The project's main objective is to offer a tool that, together with the CouchDB database (Anderson, J. C., Lehnardt, J., Slater, N., 2010) allows to (i) store ontologies in a easy scalable and easy access database, and (ii) use languages that are already established as standards for querying and inferencing OWL data, such as SWRL and SWQRL languages (W3c, 2011). The use of CouchDB is due to it meets the following criteria: • Easily scalable: The CouchDB database provides mechanisms to facilitate the distribution of data between different nodes in an ubiquitous network, offering horizontal scalability. Among these mechanisms, we highlight the easy replication of documents between the nodes – providing acess to ontology structure even individuals are being modificated in other nodes, and a version control of documents - allowing concurrency control in the database and the support for desktop and mobile environments (Anderson, J. C., Lehnardt, J., Slater, N., 2010). • Database model based on documents: The database uses a model based on JSON documents (JSON, 2011), which can be easily converted to XML structure and has a lot of frameworks to aid its use by webservices and many programming languages (Anderson, J. C., Lehnardt, J., Slater, N., 2010).

103

ISBN: 978-989-8533-01-2 © 2011 IADIS

• Compatibility with various programming languages: The CouchDB database is used regardless of programming language, because it uses the RESTful API as query interface (Anderson, J. C., Lehnardt, J., Slater, N., 2010), thus, data access can be performed in several differentenvironments, be they fixed or mobile. The SemantiCouch architecture works as a layer above the control layer of CouchDB database, and (i) performs OWL file conversion (import or export of OWL data), (ii) supports a set of classes to create a simplified representation of ontologies in Java language for simple use of semantic data in computional systems, (iii) supports querying and inferencing ontologies persisted in the CouchDB database with use of SWRL and SQWRL languages. A simplified diagram of the architecture is shown in Figure 1.

Figure 1. Basic Architecture of SemantiCouch

To provide compatibility with OWL ontologies, SemantiCouch uses a parser defined in Protégé OWL API to convert OWL data in Java language based classes, for the conversion of OWL documents, the Protégé OWL API (Stanford, 2011) works together with a parser defined in SemantiCouch to convert OWL data to JSON documents, that are persisted in CouchDB. The process of data storage is done in two basic ways: (a) held directly by RESTful API (Anderson, J. C., Lehnardt, J., Slater, N., 2010), which is used directly by the CouchDB database or (b) through the Ektorp API (Ektorp, 2011) which offers methods for conversion and querying Java objects in the CouchDB database. The SemantiCouch architecture can be used by developers and systems through an API, which is structured in packages that perform specific functions. The next subsection of the article details the function of all packages in the API.

4.1 SemantiCouch Organization The structure of SemantiCouch was separated into packages, which are responsible for different functions in the store and query process of semantic data. The packages and their functions are described below. • Comm Package: Package responsible for communication with the engine package and communications APIs (Ektorp API and RESTful API) with the CouchDB database. Also it controls the communication with the database and versioning of JSON documents. • OO Package: Package responsible for defining classes that represent the components of ontologies. These classes can be used by developers to communicate directly to Java classes from their systems with the persistence and querying methods of SemantiCouch. As main interface to operate ontologies, SemantiCouch

104

IADIS International Conference WWW/Internet 2011

offers the OOWLModel class, that defines a ontology model persisted in the CouchDB database. A class diagram of OO package classes is shown below.

Figure 2. OO Package class diagram

• Engine Package: Package responsible for the connection and control of the API. Provides the necessary interface for the API integration with other computer systems and applications. • Parser Package: Package responsible for converting formats to export or import in OWL file structure or internal conversion to formats that are used in the queries and inferences, and other operations in the database. • Query Package: Package responsible for tracking queries and inferences in the database. Provides support to SWRL and SQWRL languages, making the necessary conversion of these to the languages that are used in the CouchDB database. In addition to the programming structure, the SemantiCouch maintains a storage structure defined in the database, reducing the need for creating extra data in addition to the already informed data by the ontology, improving performance in consultations and making easy replication of documents by CouchDB. The data structure is based on the premises of performance and easy scalability defined by Anderson et al (2010). The storage structure used by SemantiCouch is shown in Figure 3.

Figure 3. Storage structure used by SemantiCouch compared to OWL original file.

The divisionof the original OWL file of the ontology was done so each entity of the ontology is divided into a JSON file, thus, facilitatethe replication process performed automatically by CouchDBand facilitate the query process, as each file has an unique id related to the type of entity that represents the document (class, individual, property, etc). This division of structure allows the system to make appointments carrying only the necessary data to the database instance. For example, if an external query needs only the data about the class structure of the ontology, the system will search data only in the corresponding file, without consulting other files that represent the individuals of the ontology. To demonstrate the features developed to date, we described a possible use case of the architecture and the results of it’s tests.

105

ISBN: 978-989-8533-01-2 © 2011 IADIS

5. USE CASE The use case conducted in this paper starts with the insertion of an ontology in the database and recovery of an individual described in this ontology, in order to evaluate the persistence architecture. We consider the following situation: An ubiquitous healthcare system is used in a hospital environment. This system has an ontological modeling of (i) patients and their clinical exams and (ii) physicians (and his devices) who works in this hospital. In a given task, a physician wants to consult a clinical examination of a particular patient during the course of care to this patient. Although the ontology to describe the patient and relate to all their medical examinations, the physician needs only (i) visualize the examinations only related to their medical specialty, and (ii) view the information in a adapted form to the screen of the device that he is using at this time.

5.1 Modeling and Inserting an Ontology in SemantiCouch We modeled this use case on a simple ontology (Figure 4), which represents the classes and properties that may interfere with the contextualized selection of content in a hospital environment.

Figure 4. Ontological model of the use case (main superclasses).

From the initial modeling of the ontology, we inserted it in the SemantiCouch, by indicating the corresponding OWL file. The insertion of this file in the system involves the following operations: (i) conversion of the file to a set of OWL API objects, (ii) scan of the objects that represent the OWL API ontology and convert them into the set of classes defined in the OO package (these classes have to do a direct conversion to JSON documents, facilitating the inclusion in the database), (iii) the inclusion of information classes in the database, using Ektorp API. The code for the inclusion of ontology in the system is shown in Figure 5.

Figure 5. Code for insertion of an OWL file in the system.

If an entity is queried via a mobile device such as an Android application that uses the CouchBase Mobile framework [10], and therefore this device goes offline, this individual may be modified and updated by other nodes when the device goes online again by the CouchDB automatically update mechanism. We perform the measuring of data related to insertion and conversion of ontologies, to do this, we used the ontology created in the earlier section and we modified the number of classes and individuals, we called Case A the ontology that has 63 classes and 15 individuals, Case B refers to the ontology that has 107 classes

106

IADIS International Conference WWW/Internet 2011

and 115 individuals and Case C refers to the ontology that has 132 classes and 215 individuals. The results are presented in the following chart (Figure 6).

Figure 6. Results of the initial insertion test of DocumentContext ontology.

As can be seen on the results presented earlier, the semanticouch architecture has a low addition of data about the OWL file used in its original form, in addition, the conversion time is relatively low, if we consider that in the conversion tests the time refers to the following steps (i) connection to the database, (ii) creating a database, (iii) conversion of the OWL file to JSON documents and (iv) insertion of documents in the database. The conversion step is performed only when it is necessary to use an ontology defined originally in an OWL file. After this first step, the system becomes able to query this ontology. The architecture of the CouchDB database allows data of the ontology to be replicated on other nodes automatically (even if is used in mobile devices), and in case of failure of one of these nodes, the ontology can be accessed and modified on other nodes, with the replication done automatically by the database. All the entities declared in the ontology were replicated at the moment that we configured the CouchDB to do the automatic replication of the databases.

5.2 Performing Queries The query process, at this prototype version, is performed by the Query Objects system provided by Ektorp API, where query is done by specific keys of persisted objects. Thus, the system can query the individual related to the blood test of a particular patient using the code presented in Figure 7.

Figure 7. Code relating a query made in the system.

The query done previously tests if the patient named “ClinicalUser1” has clinical examinations (defined in the ontology by “has” relationship between patients and exams), if the patient has registered exams, the system requests the individual that represents the clinical examination of this patient. Based on the performance of the first tests in prototype, we can take partial conclusions, identifying new possible features for the project.

6. CONCLUSIONS AND FUTURE WORK The use of ontologies in computer systems directly depends on the persistence tools because they allow the use of a part of the ontology, without the need to store the ontology in a single file. The current tools do not provide some basic characteristics to ubiquitous systems, so this work is innovative in proposing an architecture, in prototyping phase, to meet these requirements. This architecture will allow the use of ontologies for information in a distributed and simplified form.

107

ISBN: 978-989-8533-01-2 © 2011 IADIS

The next milestone will be to implement a webservice, which combined with the proposed architecture, can provide an easily integration with other systems and programming languages.

REFERENCES Anderson, J. C., Lehnardt, J., Slater, N. (2010) "CouchDB: The Definitive Guide."Book. Ed O'Reilly Media. ISBN: 9780-596-15589-6. 272 pag. 2010 Batzios, A., Mitko, PA (2009), "db4OWL: An Alternative Approach to Organizing and Storing Semantic Data," IEEE Internet Computing, vol. 13, no. 6, pp. 48-55, Nov. / Dec. 2009. Dey, AK, Abowd, GD (2000) "Towards a Better Understanding of Context and Context-Awareness."CHI 2000 Workshop on the What, Who, Where, When, and How of Context-Awareness Ektorp (2011) "EKTORP Website". Website. Available at: http://www.ektorp.org/Accessed February 2011 JSON (2011) “Introducing JSON”. Website. Available at: http://www.json. org/Accessed March 2011 Kukhun, DAA, Sedes, F. (2007) "Step Toward Pervasive Software: Does Software Engineering Need Reengineering?" In the book: Complex Systems Concurrent Engineering, Springer Ed. ISBN 978-1-84628-975-0, 2007, pp. 143-150. Machado, A., Vicentini, C., Librelotto, G., Augustin, I. (2010) “Associando Contexto as tarefas clínicas na arquitetura ClinicSpace" In: Latin American Conference XXXVI Computer - CLEI, Asuncion, Paraguay. 2010. (in portuguese) NoSQL (2011) "NoSQL – Non Relational Universe” Website. Available at: http://nosql-database.org. Accessed May 2011. Stanford (2011) “Protégé OWL-API Website” Website. Available at: http://protege.stanford.edu/plugins/owl/api/. Acessed April 2011. Strang, T., Popien, C. (2005) "The Context modeling survey". In Proc. Of the Workshop on Advanced Context Modelling, Reasoning and Management Part of the Ubicomp, pp.33-40 Swartout, W. and Tate, A. (1999). “Ontologies”. In IEEE Intelligent Systems and Their Applications, volume vl 14, n 1. IEEE. W3C Consortium (2011) "SemanticWeb Standards by W3c” Website. Available at: http://www.w3c.org/standards/semanticweb. Accessed March 2011. Weiser, M. (1991) "The Computer of the 21st Century", Scientific American, v. 265, n. 9, 1991.

108

Suggest Documents