Document not found! Please try again

An Open Architecture for Ontology-Enabled Content Management ...

8 downloads 15458 Views 638KB Size Report
An important goal of a content management system (CMS) is to acquire and .... of issues including the delivery of instructional materials and the management.
An Open Architecture for Ontology-Enabled Content Management Systems: A Case Study in Managing Learning Objects Duc Minh Le1 and Lydia Lau2 1

Department of Computing, Imperial College London, 180 Queen’s Gate London SW7 2AZ, U.K. [email protected] 2 School of Computing, University of Leeds, Leeds LS2 9JT, U.K. [email protected]

Abstract. An important goal of a content management system (CMS) is to acquire and organise content from different data sources in order to answer intelligently any ad-hoc requests from users as well as from peer systems. Existing commercial CMSs address this issue by deploying structured metadata (e.g. XML) to categorise content and produce search indices. Unfortunately, these metadata are not expressive enough to represent content for sophisticated searching. This paper presents an open architecture framework and a Java-based reference implementation for Ontology-enabled Content Management System. The reference implementation uses an open-source CMS called OpenCMS, the Prot´ eg´ e’s OWL library, and RacerPro reasoning engine. The implemented system is a web-based management system for learning objects which were derived from the course and instructional materials used in several postgraduate taught courses. We believe that our OeCMS architecture and implementation would provide a strong platform for developing semantic web protals in general.

1

Introduction

A content management system (CMS) provides an integrated environment for an organisation to develop and manage presentable content in a wide variety of formats. A primary goal of a CMS is to automatically acquire and organise the content from different sources in order to answer intelligently any ad-hoc requests from users as well as from other peer systems. To achieve this, the following challenges would need to be addressed [13] [7]: – To provide an extensible technology framework for connecting to and manipulating heterogeneous data sources; such as relational databases, web contents, documents as well as legacy contents – To analyse, categorise and integrate (i.e. merge or map) contents in their corresponding domain contexts to reveal semantic structures R. Meersman, Z. Tari et al. (Eds.): OTM 2006, LNCS 4275, pp. 772–790, 2006. c Springer-Verlag Berlin Heidelberg 2006 

An Open Architecture for Ontology-Enabled Content Management Systems

773

– To formalise these semantic structures using some form of descriptive rule set and serialise them in hierarchical concept repositories – To provide intelligent access interfaces appropriate for people and/or computer software (agents) to search and make use of the underlying ontologies. In this paper, we propose an open architecture for an ontology-enabled CMS (OeCMS) which is an attempt to address the challenges outlined above. Since ontology helps unambiguously define the meaning of things in terms of formal concepts with clearly defined relationships [10], it would serve well as a common integrator for the different types of content in a CMS. This framework could also be used for developing semantic web portals in general as their underlying concept is very similar to the combination of semantic web technologies and CMS. We will clearly show in this architecture how the required semantic web components could be developed and/or integrated to the content management infrastructure. The reference implementation of this architecture uses mostly Java open-source technologies at the core layer. This implementation was based on a case study for managing learning objects in a number of selected taught postgraduate courses at the University of Leeds. The rest of the paper is structured as follows. Section 2 reviews the related work in the areas of content management system and semantic web portals. Section 3 characterises an OeCMS in terms of a set of requirements and design criteria. This is then followed by Section 4 which discusses the general OeCMS architecture. Section 5 presents a reference implementation in Java with a set of essential functions. Finally, Section 6 concludes the paper.

2

Related Work

This section will highlight the current development in three main areas which are relavant to our research. Content Management System. A content management system [5] is designed to support a content management cycle which includes the creation and collection of content, the publication of content for access by users and/or other systems and the management of these content. An important glue for these different CMS components is content metadata which are extracted (semi-)automatically from the content. However, a major limitation of the existing CMSs is inadequate support for expressing this metadata to allow for accurate retrieval of content. Recent research work have shown that ontology would provide a viable solution to this problem because it can help define the formal specification of a problem domain [10]. The most recent work which bear some similarities to ours is Infoflex [8]. This system also used semantic web technologies in content management systems but mainly for integrating existing CMSs while answering content search requests. This differs from our approach in that we propose that each CMS system should adopt an open architecture to tackle ontology-enabled content management.

774

D.M. Le and L. Lau

Semantic Web Portals. Another related area is the development of a Semantic Web Portal (SWP). In essence, a semantic web portal [19] is a content management system to the extent that it also needs to acquire, organise and distribute information to interested users. A SWP, however, differs in that it deploys semantic web technologies such as ontology and related tools in order to improve the representation of content metadata and, therefore, information retrieval. Two recent work in this area have proposed a system infrastructure [24] and an architecture [19] for SWPs. In [19], a SWP is viewed as an enrichment of a three-layer web-based information portal by adding Semantic Web technologies as a sub-layer of its ”Grounding Technologies” layer. However, it is not clear from this proposal how this addition would gel in with the existing portal technologies. Also, this paper did not discuss if any changes to its information processing (i.e. middle) layer would be needed in order to utilise the semantic web technologies. In [24], a detailed technical infrastructure for SWPs, called Semantic Web application server, was discussed. The main advantage of this architecture lied in its modularity which considered all software modules as self-contained components with a standard system interface. This KAON SERVER [25] application server was implemented in Java based on JBoss [9] open-source application server. However, this system only focussed on functionalities at the infrastructure layer and did not address the content management cycle. Although the proposed system in [24] represents an alternative application server component to that of our architecture, its all-in-one design makes it less flexible compared to our approach (see Section 5.1). Web-Based Learning Systems. These represent the class of e-learning systems that cover the scope of our case study. These systems may tackle a range of issues including the delivery of instructional materials and the management of learning objects about students and assessment. Research in this area have begun exploring the use of ontology for describing these learning objects. It was argued that ontology can help improve the representation and organisation of teaching resources and student profiles [20] [23] [26]. However, these research lacked openness in their underlying technical platforms. In brief, no research work reported so far in the literature have explored the design and development of semantic web portals from the perspective of a content management system. We argue that it is advantageous to follow this approach given the similarities between these portals and CMSs and that it is possible to develop the semantic web extensions for the content management cycle in a CMS.

3

Requirements and Design Criteria of an Ontology-Enabled CMS

As suggested earlier, the OeCMS aims to improve the search results by extending the capability of a CMS with semantic web technologies. Hence, it should

An Open Architecture for Ontology-Enabled Content Management Systems

775

meet the requirements of the traditional CMS and those of the Semantic Web technologies. 3.1

Requirements

The main requirements of an OeCMS are: – Multiple representation [24] to support multiple ontology languages (e.g. RDF and OWL Lite/DL/Full [2]). – Semantic content syndication to (semi-)automatically acquire contents from different data sources, extract the key concepts and establish the relationships between these concepts. – Ontology Mapping [24] which is a pre-requisite of semantic content syndication. This is a fast-evolving and challenging research area of ontology which aims to achieve some degree of automation in matching and merging ontologies from potentially different but overlapping domains. – Integrated ontology engineering method which consists of techniques and tools that allow the CMS development team to identify, design and build a knowledgebase for a problem domain and to seemlessly deploy and maintain this knowledgebase in the system. – Integrated access to harness the expressiveness of semantic content for the purpose of producing system interfaces (e.g. for navigation or searching) that can greatly increase the accuracy and relevancy of the requested information from both users and peer systems. – Ease of use [24] to provide at least a compatible level of ease-of-use of the ontology interface for existing users of an CMS. – Inferencing and Verification [24] which are two primary functionalities of the ontology reasoning engine in the OeCMS. Through reasoning, the system is able to search the knowledgebase in a finite time for instances matching a query. Also through reasoning, the system is able to compute any inconsistencies or anomalies that exist in the ontology. – Access Control and Versioning [24] to manage multiple user access; to detect and monitor user’s access to content documents and the knowledgebase. While access control to content documents is a built-in feature of the underlying CMS, it still remains an open quesion as to how to do the same for the knowledge-base. Ontology versioning is a background process for supporting the evolution of an ontology knowledgebase. Different versions of the knowledgebase are maintained persistently to allow for easy roll-back to a previous version when needed. 3.2

Design Criteria

Since ontology and related tools are continuously evolving, it is important to have an adaptable OeCMS architecture which can scale well with changes. The following list summarises the main criteria ([15] [24]) of such a design:

776

D.M. Le and L. Lau

– Modularity. The system is component-based with self-contained software modules to allow for maximum development and maintenance flexibility. – Interoperability. The ability to communicate with other systems via openstandard protocols such as XML [2], Web Service Definition Language [6], OWL-S [21], and Description Logic Implementation Group (DIG) for ontology reasoning [4]. – Manageability. The ability to streamline the management of content, ontology and system functionalities. – Scalability. The ability to cope with an ever-increasing number of requests and an increasing demand of resources (e.g. processing, storage) without major changes to the architecture. – Integrability. The ability to upgrade the system with new functionalities without major changes to the overall architecture.

4

An Open Architecture for Ontology-Enabled CMS

The proposed OeCMS, see Figure 1, is based on a layered approach to clearly define the responsibilities and services of each distinct class of components. The clear advantages of this layering are modularity, interoperability and easy integration. The following sub-sections will discuss each layer in greater detail. 4.1

The Web Interface Layer

This layer provides a uniform web-based interface for users to access and manage both content documents and their ontology-enabled semantics. The web technologies involved, e.g. HTML/XML and client-side scripting, should conform to W3C standards [33]. The major technical challenge at this layer is in the building of a dynamic and user-friendly view of an ontology using the rather restrictive set of browser technologies (e.g. HTML and Javascript). The difficulty lies in the ability to draw an incremental picture of the ontology in order to hide from the user the complexity inherent in the many-to-many relationships between ontological concepts. A diagramatic view of a modest ontology is sometimes hard for a novice user to understand. In addition to the use of HTML/Javascript, several emerging technologies such as TouchGraph [30] and Formal Concept Analysis [11] have been proposed as alternatives for the ontology view component of this layer. However, these tools require browser plug-ins (e.g. Java Applet) which are not always accessible to the users. 4.2

The Semantic Content Management Layer

This layer consists of two main components: content management and ontology management. The content management component provides functionalities to support the four main phases of a typical content management cycle [5]: (1) create content (2) publish content (3) access content (from user or other systems) and (4) management content. Each phase in this cycle is supported by one or more primitive ontology management functions. The primitive functions shown

An Open Architecture for Ontology-Enabled Content Management Systems

777

Fig. 1. The OeCMS Architecture

in Figure 1 are not meant to be exhaustive but they form the core of the ontology management: – map: when a content document is defined or uploaded to the system, one or more ontology instances are created and mapped to elements of this document. A typical mapping technique is to store the url of a content document with each ontology instance defined for that document so that the system can later retrieve all associated content documents for a given set of ontology instances. It is worth noting that here we define a generic map function for all types of source documents including raw (typically binary) and (semi/un)structured. Although the techniques for (semi-/un)structured types are well documented (e.g. [1] [17] [18] [22]), those for raw type remain largely unexplored.

778

D.M. Le and L. Lau

– retrieve: retrieves the ontology instances related to a content document so that the system can automatically annotate the document’s view with additional information. – annotate: this function produces a component view of related ontology instances to a content document. This function depends on the retrieve function to access information about the properties of the ontology instances. – search: this important function searches the ontology knowledgebase in the ”Data Management” layer for instances matching a specific criterion. This function naturally depends on retrieve and annotate to generate the results. – create/edit/delete/commit: these are the standard functions for maintaining the ontology knowledgebase. They can be applied to all types of ontological objects including concept, property and instance. To increase modularity, these primitive functions rely on the manager components of the core layer for detailed algorithmic implementations. The next subsection will discuss the structure of the core layer which enables this modularity. 4.3

The Core Layer

This core layer has three basic sub-layers: – operating system interface: this sub-layer implements the basic system functionalities (e.g. creating network sockets, accessing local/networked files, etc.) and make them available as service to upper layers. – application server: this sub-layer represents the environment in which an OeCMS can easily be deployed and managed. This layer leverages the existing application servers many of which are open-source (e.g Tomcat [28] and JBoss [9]). In practice, the operating system interface may be implemented as part of this sub-layer. – content management engine: supports both content and ontology management functionalities of the semantic content management layer. This engine is typically implemented as a bare-bone CMS system which provides a CMS API for developers access to content management functionalities. More importantly, it must support the deployment of an ontology API, an ontology reasoner and a set of custom-built ontology management functions: ontology API: implements the state-of-the-art ontology language specifications (e.g. RDF/OWL) and provides developers with the functionalities needed to design, develop and test ontologies. This API also provides a standard programming interface for interacting with the ontology reasoner. ontology reasoner: implements the logics behind ontology languages and to provide the facility needed for answering such complex queries as ’Find all lecturers who teach subject Information System Enginering in 2005 ’. Note from Figure 1 that the separation of ontology reasoner from other core sublayers is only logical to reflect the fact that there are two possible methods of integrating a reasoner engine to the core layer: (1) as an independent application accessible via a network programming interface (e.g. RacerPro [12])

An Open Architecture for Ontology-Enabled Content Management Systems

779

and (2) as an embedded software component accessible via a programming interface (possible with future reasoners). ontology management components: are implemented as a set of managers and component classes. These software components are not provided by the CM engine and, thus, developed as plugged-in modules of this engine. Figure 1 shows the dependencies of these components on the core sub-layers. • Search Manager: this is a self-contained search component that encapsulates the different ontology-related search algorithms. The main objective of this component is to perform an integrated search which leverages the powerful text expression provided by existing text-based search and the reasoning capability provided by ontology language. • Ontology Manager: manages the run-time cycle of an ontology knowledgebase. For example, this component is responsible for loading the knowledgebase and controlling run-time application access to ensure the the knowledgebase integrity. • Reasoner Manager: is responsible for communication with the ontology reasoner via a standard query interface such as DIG. In particular, the reasoner manager defines standard function calls for instructing the reasoner engine to load a given ontology and for subsequently executing an ontology query on the reasoner. • View Manager: is responsible for dynamically generating the web interfaces for ontology-based and content-based tasks. • Component Classes: these are atomic classes that are generated (either through coding or code engineering) to encapsulate the state and behaviours of the primitive domain entities. An example of a component class would be Department which represents the department concept in the knowledge-base. 4.4

The Data Management Layer

This is the bottom layer of the architecture which is responsible for database and/or file access. This layer provides a common access interface for both content base and ontology knowledgebase. The content-base is typically stored in a relational database for easy management and querying, whilst the ontology knowledgebase can be serialised into a database or to a file for convenient re-use by other systems.

5

A Reference Implementation in Java

The OeCMS architecture discussed in Section 4 was used in developing a webbased system, called WOICMS, for managing learning objects metadata at the University of Leeds. This section will discuss this prototype implementation which demonstrates the performance of the essential semantic content management functions of the OeCMS architecture (see Section 4.2). The prototype requirements are:

780

D.M. Le and L. Lau

– the creation of a learning-object ontology for teaching materials used in a number of selected courses – the development a web-based navigation interface assisted by the learning object ontology – the development of two ontology-integrated search methods based on the learning object ontology The choice of an implementation platform will be explained first because this dictates the development environment and deployment strategy for the WOICMS prototype. This is then followed by separate sub-sections on the implementation for each of the above requirements. 5.1

The Java-Based Implementation Platform

The implementation platform consists of the following software components which form the core layer of the OeCMS architecture: the application server with its operating system interface, the CMS API, the ontology API and the reasoner engine. Again, the design criteria for this platform is openness so that it can be used not only for adding new functions to the prototype in later stages but also for developing semantic web protals in general. We decided to use Java as the base platform which supports the following: – – – –

Tomcat as an open-source application server and operating system interface OpenCMS [32] as an open-source CMS engine (deployed in Tomcat) Prot´ eg´ e/OWL [14] as an open-source ontology development API RacerPro [12] as the DIG-compliant reasoner engine

The combination of OpenCMS and Tomcat provides a favourable technical infrastructure (compared to others such as Zope [31]) because it naturally supports the deployment of the ontology development tool. In addition, the deployment of OpenCMS as a web application in Tomcat provides an extra, infrastructurelevel modularity. We used Prot´ eg´ e-OWL plug-in which allows the developers to develop and test concepts and instances in OWL-DL. This also provides an API for writing Java code to manage ontology and to interact with RacerPro through a TCP-based programming interface called JRacer. The RacerPro engine provides OWL-based inference and verfication services through its DIG-compliant programming interface. 5.2

The WOICMS Implementation

Having concluded an implementation platform, let us now briefly discuss the development of the essential functions of the WOICMS system prototype using this platform. Firstly a virtual site was created in OpenCMS for the WOICMS’s application space which represents the semantic content management layer of the OeCMS architecture. Next the manager and component classes of the core layer and the management functions of the semantic content management layer were developed using the OpenCMS and Prot´ eg´ e-OWL APIs. The manager and

An Open Architecture for Ontology-Enabled Content Management Systems

781

component classes were deployed to OpenCMS, whilst the semantic content management functions were developed as JSP pages and deployed to the WOICMS virtual site. Note also that each web page presented to the user at the web interface layer is generated dynamically by a functional JSP page at the semantic content management layer. At its final processing stage, this JSP page invokes the appropriate ViewManager’s methods to produce the component views for the required web page from a set of pre-defined HTML/Javascript templates. A template can either be specifically tailored to one web page or be a generic view component such as URL link, list box, text box, table and the like. The decomposition of our templates down to the HTML-tag level allows us to conveniently annotate individual view components of a result web page with ontology-enabled content. Refer to Section 5.4 for an examplary use of templates. 5.3

Developing the Learning Object Ontology

The domain for the learning object ontology covered information used in a number of selected postgraduate taught courses. For each course, we were interested in knowing about its syllabus, teaching staff, instructional materials and references to other relavant resources. Course materials included web pages and binary files such as Powerpoint slides and Word documents. The preliminary design of the learning object ontology was carried out in the Prot´ eg´ e ontology editor tool [16] using the OWL-DL language as it provides more expressive power over its predecessors. Figure 2 shows the conceptual view of this ontology. For brevity, the super-parent concept, M oduleElement, is not shown in this figure. Let us take the relationship set between Ref erence and its sub-concepts as an example. This set is characterised by the following rules: – A Ref erence instance must be of type P rintedP ublication, OnlineP ublication or HybridP ublication (i.e. both printed and online) – A Ref erence instance may contain links to a number of other reference instances – Book and Journal are disjoint sub-types of P ublication, i.e. a Book instance is distinctively different from all other Journal instances. Using the DL-based syntax [3], this set of inter- and subsumed relationships can be defined in the Prot´ eg´ e-OWL editor as follows ( implies necessary and ≡ implies necessary and sufficient ): Ref erence  M oduleElement  (∃ belongsT o M oduleSkeleton)  (∃ authoredBy Author)  (∃ hasRef erence Ref erence) P ublication  Ref erence P ublication ≡ OnlineP ublication  P rintedP ublication  HybridP ublication

782

D.M. Le and L. Lau OnlineP ublication  P ublication  (∃ webAddress W ebAddresss) P rintedP ublication  P ublication  (∃ publisher P ublishingHouse) HybridP ublication ≡ OnlineP ublication  P rintedP ublication Book  P ublication  (¬ Journal) Journal  P ublication  (¬ Book)

Fig. 2. The Conceptual View of Learning Object Ontology

After the initial design, the LearningObject ontology was serialised into a file using the RDF/XML syntax and later deployed in OpenCMS. At application start-up, the ontology file would be pre-loaded into memory and also onto the RacerPro server. Ontology instance(s) were created and mapped to each content document through the ontology editor interface in Figure 3. Each editor’s view is dynamically generated by the ViewManager for an ontology instance of a given

An Open Architecture for Ontology-Enabled Content Management Systems

783

Fig. 3. The Web-Based Ontology Editor

concept. In this example, the ontology instance is titled ’Information Modelling’ and the concept is ModuleSkeleton. A useful feature of the editor is that the view component of an object-type property is generated with a set of Javascriptenabled controls which are used to manage the property’s range. 5.4

Ontology-Assisted Navigation

A practical and widely used method to visualise an ontology using the current web browsers technologies is to render the ontology subsumption hierarchy as a tree and uses it to assist with content navigation. Figure 4 illustrates such an interface developed in this project. Technically, this interface is consisted of two frames working in coordination. The left-frame contains two navigation controls: (1) a concept search box which helps retrieve a concept’s definition from the knowledgebase and (2) an ontology tree to guide user’s input. Actions performed on the left-frame are processed by the corresponding JSP pages at the server who invoke the appropriate methods provided by the manager classes at the core layer to produce the result view. This result view is then pushed back to the right-frame for display. The main advantage of using a frame-based design for this interface is the significant reduction in response time by avoiding the re-contruction of the ontology tree everytime a new web page is displayed. The sequence diagram illustrated in Figure 5

784

D.M. Le and L. Lau

Fig. 4. The Ontolog-Assisted Navigation Interface

Fig. 5. The Sequence Diagram for Ontology-Assisted Navigation

explains the interaction between different components in creating the ontology tree: – upon receiving a request to view the ontology, the OntologyVisualiser.jsp page invokes the OntManager.openKB() method to load the OWL-DL ontology (LearningObject.owl) into memory (this is performed once)

An Open Architecture for Ontology-Enabled Content Management Systems

785

– the OntologyVisualiser.jsp then instantiates a singleton object of the class KBTreeViewHelper with the loaded LearningObject ontology. This is followed by an invocation of the KBTreeViewHelper.parseTree() method which recursively traverses the ontology’s sub-sumption hierarchy and constructs the corresponding tree view as shown in the left-frame of Figure 4. This tree is rooted at the owl:Thing concept which is the default parent of all OWL concepts. – the OntologyVisualiser.jsp retrieves the ontology tree view and calls the ViewManager.parseSingleDimTemplate() method which operates on the the page template of the left-frame to produce the final view before pushing it back to the client’s browser for display 5.5

Ontology-Integrated Search

The motivation behind ontology-integrated search are the capability gaps between the native ontology search and reasoning and text-based search. The first gap is between the rather restrictive text-matching facility currently supported by ontology search and the rich regular expression provided by text-based search methods (e.g Lucene [27]). The second gap is between the lack of support for a formal (i.e. unambiguous) definition of search queries in text-based search and the reasoning capability of the ontology language. Therefore, ontology-integrated search methods were devised to bridge these gaps by leveraging: – the expressiveness of the ontology language for designing formal search queries – the reasoning capability of the ontology language for defining a semantic search scope for content documents – the rich regular expression of text-based search to retrieve the relavant content documents from the semantic search scope In our project, two ontology-integrated search methods were implemented (refer to Table 2 for their psuedocodes): – Guided search: has an incremental design interface for generating on-thefly RacerPro’s queries in nRQL [29] language (refer to Table 1 for an example). The result of executing this query on the reasoning engine produces a semantic-search scope in which text-based search is performed. This method uses RacerPro and the Lucene search API (included in the OpenCMS API) – Parameterised search: the query interface uses the simple text-based search provided by the ontology API to locate a branch (i.e. a concept and its sub-concepts) of the ontology sub-sumption tree to scope the text-based search that follows. This method uses the Prot´ eg´ e and Lucene search API We evaluated the usefulness of these two search methods by uploading the contents of three courses web sites on to the system. Based on the overall knowledge about these courses, we then designed a set of 18 query pairs for 18 different query requirements. Each pair had a query in nRQL language and a query in Lucene query format both seeking answer for the same requirement. The query

786

D.M. Le and L. Lau Table 1. An Example of Evaluation Queries

Integrated Search

Content Search

- define the search scope using nRQL query:

Use the following text expression to search:

(retrieve (?ModuleSkeleton ?Topic ?Reference ) ’module’ AND (and (?ModuleSkeleton |ModuleSkeleton|) (’topic’ AND "*SQL*") (?ModuleSkeleton ?Topic |discussTopic|) (?Topic ?Reference |hasReference|))) - search content documents using text expression ”*SQL*” - The Guided Search Interface

requirements were chosen to have different levels of complexity. For example, the query pair shown in Table 1 answers to the requirement Find all course modules that discuss topics about SQL?. Also illustrated in this table is the guided search interface which defines the scope of the semantic search. This interface has two parts: (1) the top part, which covers steps 1 and 2, is a wizard-like dialog which helps the user design an nRQL query for the search scope and (2) the bottom part (step 3) has a text field for entering an optional text expression for the subsequent text-based search. The evaluation result showed that on average the relavancy of content documents returned from the integrated search methods was above 90% compared to around 50% relavancy of those returned from the normal text-based search.

An Open Architecture for Ontology-Enabled Content Management Systems

787

Table 2. The Psuedocodes of Two Search Methods Guided Search

Parameterised Search

OWL_M = load OWL model from OWL_M = load OWL model from file file ’LearningObject.owl’ ’LearningObject.owl’ Initialise RacerPro with owlModel Initialise Racer with OWL_M as follows: as follows: initialise reasoner_manager with initialise reasoner_manager with OWL_M OWL_M Connect reasoner_manager to Connect reasoner_manager to Racer Racer server’s URL via HTTP port 8080 server’s URL via HTTP port 8080 Send OWL_M Send OWL_M C = concept name specified by user Construct matched instance set Pt = knowledge-base search pattern M_I_S as follows: specified by user C_S = concepts selected by user Q = content query specified by user P_S = properties selected by user C_S = set of concepts in the concept tree V_S = property values specified of C in OWL_M by user Construct instance set I_S containing rql_query = generate RQL query instances matching Pt in OWL_M as follows: from (C_S, P_S, V_S) P_S = set of all user-defined RDF Execute rql_query as follows: properties from OWL_M Connect to Racer via TCP For each property P in P_S port 8088 If value V of P matches Pt result_string = send rql_query P_I_S = set of instances in OWL_M Parse result_string into M_I_S that owns P Retrieve file reference set For each instance I in P_I_S F_R_S from M_I_S as follows: If I not exists in I_S For each instance I in M_I_S Put I to I_S O_P_S = set of OWL object Filter matched instance set M_I_S from properties of I I_S using C_S as follows: For each property P in O_P_S For each instance I in I_S R = range of P D_C_S = set of concepts which are the If R equals ’FileReference’ direct parents of concept of I P_V_S = set of property If D_C_S intersects with C_S values of P Put I to M_I_S Put P_V_S to F_R_S Retrieve file reference set F_R_S from M_I_S Retrieve content document set as in ’Guided Search’ method C_D_S as follows: Retrieve content document set C_D_S Q = content query specified and filter C_R_S from C_D_S by user using F_R_S as in ’Guided Search’ method C_D_S = Execute Lucene search Display C_D_S of Q Filter content result set C_R_S from C_D_S using F_R_S as follows: For each file url F in C_D_S If F exists in F_R_S F_O = file object (F) in F_R_S F_M = Construct file_metadata of F_O Put F_M to C_R_S Display C_R_S

788

6

D.M. Le and L. Lau

Summary and Future Work

This paper presented an open architectural framework for the class of ontologyenabled content management system. This architecture has a semantic content management layer which provides the functionalities for developing both content documents and their semantic descriptions on the system. This is supported by a core layer which has a modular design to leverage the traditional content management engine and ontology development tools. A reference implementation based on a Java technical platform consisting of proven open-source components was also discussed. This implementation showed that it is possible to extend the support for a traditional content management cycle in a CMS with a set of primitive ontology management functions. These ontology management functions help construct the content semantics in terms of the formal ontological concepts. When expressed in a powerful language such as OWL, more intelligent interface could be designed for user to navigate and access the content. More importantly, our implementation showed that a number of search methods could be developed to leverage the benefits of the content ontology to deliver more accurate results to users. The proposed OeCMS architecture and its implementation together would provide a strong technical platform for developing semantic web portals in general. An extension to this work would be the integration of (semantic) web service technologies to the core layer to support an open model of distributed collaboration with other systems.

References 1. K. Ahmad and L. Gillam. Automatic ontology extraction from unstructured texts. In Proceedings of the ODBASE 2005, pages 1330–1346, 2005. 2. G. Antoniou and F. van Harmelen. A Semantic Web Primer. The MIT Press, Cambridge, Massachusetts, 2004. 3. F. Baader, D. Calvanese, D. McGuineness, D. Nardi, and P. Patel-Schneider. The Description Logic Handbook. Cambridge University Press, 2003. 4. S. Bechhofer, R. Moller, and P. Crowther. The DIG description logic interface. In Proc. of International Workshop on Description Logics (DL2003), San Diego, California, USA, 2003. 5. B. Boiko. Content Management Bible. Wiley Publishing, New York, 1st edition, 2002. 6. E. Christensen, F. Curbera, G. Meredith, and S. Weerawarana. Web Services Description Language (WSDL) 1.1. W3C, 2001. Available from http://www.w3.org/TR/wsdl. 7. D. Fensel. Semantic Web application areas. In Proceedings of the 7th International Applications of Natural Language to Information Systems, Stockholm, Sweden, 2002. 8. N. Fernandez-Garcia, L. Sanchez-Fernandez, and J. Villamor-Lugo. Next generation web technologies in content management. In Proceedings of the WWW2004 Conference, New York, USA, 2004. 9. M. Fleury and F. Reverbel. The JBoss extensible server. In Proceedings of the International Middleware Conference, 2003.

An Open Architecture for Ontology-Enabled Content Management Systems

789

10. T. R. Gruber. A translation approach to portable ontology specifications. Knowledge Acquisition, 5:199–220, 1993. 11. J. Guoqian and R. S. Harold. FCA view tab, 2004. Available from http://info.med.hokudai.ac.jp/fca/fcaviewtab/fcaviewtab.html. 12. V. Haarslev and R. M¨ oller. Racer: An OWL reasoning agent for the Semantic Web. In Proceedings of the International Workshop on Applications, Products and Services of Web-based Support Systems, in conjunction with the 2003 IEEE/WIC International Conference on Web Intelligence, pages 91–95, Halifax, Canada, October 2003. 13. J. Hartmann and Y. Sure. Semantic Web challenge: An infrastructure for scalable, reliable, Semantic Portals. IEEE Intelligent Systems, 19(3):58–65, May 2004. 14. K. Holger et al . The Prot´ eg´ e OWL plugin: An open development environment for Semantic Web applications. In Third International Semantic Web Conference ISWC 2004, Hiroshima, Japan, 2004. 15. R. Kazman et al . ATAM: Method for architecture evaluation. Technical report, Carnegie Mellon University, 2000. 16. H. Knublauch, M. A. Musen, and A. L. Rector. Editing description logic ontologies with the Prot´ eg´ e OWL plugin. In International Workshop on Description Logics - DL2004, Whistler, BC, Canada, 2004. 17. L. Kof. An application of natural language processing to domain modelling – Two case studies. International Journal on Computer Systems Science Engineering, 20(1):37–52, 2005. 18. N. Kozlova. Automatic ontology extraction for document classification. Master’s thesis, Computer Science Department, Saarland University, February 2005. 19. H. Lausen et al . Semantic Web Portals - state of the art survey. Technical report, DERI, 2004. Available from http://www.deri.ie/publications/techpapers/documents/DERI-TR-2004-0403.pdf. 20. C. W. Lo, K. T. Ng, and Q. Lu. CJK knowledge management in multi-agent mlearning system. In Proceedings of the First International Conference on Machine Learning and Cybernetics, IEEE, 2002. 21. D. Martin et al . Bringing semantics to web services: The OWL-S approach. In Proceedings of the First International Workshop on Semantic Web Services and Web Process Composition (SWSWPC 2004), San Diego, California, USA, 2004. 22. G. Modica. A framework for automatic ontology generation from autonomous web applications. Master’s thesis, Department of Computer Science, Mississippi State University, December 2002. 23. D. L. Musa et al . Sharing learner profile through an ontology and web services. In Proceedings of the 15th International Workshop on Database and Expert Systems Applications, IEEE, 2004. 24. D. Oberle, S. Staab, R. Studer, and R. Volz. Supporting application development in the Semantic Web. ACM Transactions on Internet Technology, TOIT, 5(2), 2005. 25. D. Oberle, S. Staab, and R. Volz. An application server for the Semantic Web. In Proceedings of the 13th International WWW Conference, 2004. 26. D. Woelk and P. Lefrere. Technology for performance-based lifelong learning. In Proceedings of the International Conference on Computers in Education, IEEE Computer Society, 2002. 27. Lucene performance benchmarks, 2005. Available from http://lucene.apache.org. 28. Apache Tomcat, 2005. Available from http://jakarta.apache.org/tomcat/index.html.

790

D.M. Le and L. Lau

29. RacerPro User Guide Version 1.8, 2005. Available from http://www.racersystems.com. 30. Touchgraph, 2005. Available from http://www.touchgraph.com. 31. The Zope Book 2.6 Edition, 2005. Available from http://zope.org/Documentation/Books/ZopeBook/2 6Edition/. 32. OpenCMS 6.0 interactive documentation, 2005. Available from http://www.opencms.org/opencms/en/download/documentation.html. 33. HTML 4.01 specification. W3C Recommendation, 1999. Available from http://www.w3.org/TR/REC-html40/.

Suggest Documents