web based Conversational CBR application by utilizing a domain ... the information about the data, such as creator of the data, needs to be represented. To.
A Web based CCBR Recommender System for Ontology aided Metadata Discovery Mehmet S. Aktas1, 2 David B. Leake1 Marlon Pierce2 Geoffrey C. Fox1, 2 1
Indiana University, Computer Science Department Lindley Hall 215 150 S. Woodlawn Ave. Bloomington, IN 47405-7104 {maktas, leake, mpierce, gcf}@cs.indiana.edu http://www.cs.indiana.edu 2 Indiana University, Community Grids Labs 501 N. Morton St. Room 222 Bloomington, IN 47404
Abstract. We introduce a web based Conversational Case-Based Reasoning Recommender system utilizing semantic web markup languages as standard way of case representation and focusing particularly in metadata discovery within resource-intensive environments. We present our initial efforts in designing and developing ontologies for an Earthquake Simulation Grid. We describe implementation details of our application. We discuss our methodology in creating the RDF representation of the CBR cases.
1 Introduction This paper is introducing a Case Base Reasoning (CBR) retrieval tool which utilizes a methodology for representing cases with semantic web technologies and focusing particularly on the needs and problems of Solid Earth Research Virtual Observatory Grid (SERVOGrid) Project [1-3]. The CBR retrieval mechanism presented here provides recommendations in metadata discovery on a virtual metadata layer of SERVOGrid Resources. SERVOGrid integrates historical, measured, and calculated earthquake data with simulation codes. SERVOGrid resources are located at various institutions across the country. The primary things to be managed in SERVOGrid domain are metadata about data, such as General Positioning System (GPS), Fault and Seismicity data, and General Earthquake Modeling (GEM) codes, such as application, visualization and simulation codes. SERVO Grid is a rapidly evolving project. The number of resources, services and their usage frequencies are anticipated to grow fast. So, it is an appearing need to provide effective ways of locating and accessing desired resources. The problem of locating resource of interest, within a large resource-intensive environment, like
SERVOGrid environment, has been referred as resource discovery. In attempt to solve resource discovery problem, there is need for an efficient retrieval mechanism. Such efficient retrieval mechanism must provide guidance to general users, since SERVOGrid resources are too complex to be distinguished. As an example of SERVOGrid resources, a GEM code can be taken. A GEM code needs to be described by taking into consideration many aspects of earthquake modeling and the implementation details of the code. To this end, for a general user, it is hard to know all the necessary aspects for an efficient query for the code satisfying user’s needs. So, there is a need for not only a retrieval mechanism but also a recommendation system to find the resources of interest where the resources are too complex to find with traditional retrieval systems. SERVOGrid environment does not form a strong domain theory implicit in its metadata repository, since SERVOGrid resources are volatile. In other words, it is hard to make generalizations based on given set of metadata instances. In addition, SERVOGrid domain does not form a domain where the experts can specify set of rules that fully cover the range of possible recommendations that could be made. Since the resources can join or leave the system any time, it would be impossible to specify such general rules which would be stable for SERVO resources. We find lazy learning methods, such as CBR where the recommendations made by doing reasoning from previously stored cases, are most suitable for SERVOGrid domain. In CBR Recommender systems, when similar situation (problem description) is entered, most similar cases (metadata about resources) are suggested to the user as results. Due to the complexity of SERVOGrid cases, it is necessary to guide the user at each step of the retrieval. This way a general user would be able to query SERVOGrid domain without detailed knowledge about it. This is why; in our application we adopt Conversational CBR (CCBR) as a method of CBR where user interacts with the system to fill in the gaps to retrieve the right cases. In CCBR method the system responds with ranked cases and questions to distinguish them. Question-answer-ranking cycle continues until success, where user finds an answer to his/her query, of failure, where there is no satisfying case found in the casebase. In recent years, several CBR systems, using XML as standard case representation language, have been introduced [4, 5]. There are various advantages of standard representation of CBR cases such as interoperability and modularity. Such representation provides interoperability because CBR cases could be used by non CBR applications. It also provides modularity since CBR cases can be manipulated independent from CBR tool processing the case data. We adopt Semantic Web [6] languages such as RDF/RDFS [7] as representation syntax of metadata. RDF representation of CBR cases provides a standard way of representation. Semantic Web is an evolution of the existing Web. The Semantic Web attempts to define the metadata information model for the WWW to aid in information retrieval and aggregation. It provides general languages for describing any metadata in addition
to advanced capabilities intended to enable knowledge representation and limited machine reasoning. Cases for SERVOGrid environment are simply descriptions of SERVOGrid objects such as earthquake simulation codes and data. CBR cases may reflect only some aspects of the knowledge describing SERVOGrid objects. To this end, CBR case data could be considered as a portion of the knowledge (metadata) about a SERVOGrid object. A CBR case data could simply be taken from RDF metadata triples. To this end, for our purposes, it is important to represent CBR cases in a RDF/RDFS format so that it could be integrated with SERVOGrid metadata representation and we could easily extract relevant case data form SERVOGrid metadata. In this paper we introduce our initial effort in representing the CBR cases by using Semantic Web markup languages such as RDF\RDFS. We design and develop ontologies for an Earthquake Simulation Grid such as SERVOGrid project. We implement a web based Conversational CBR application by utilizing a domain independent CBR engine such as Indiana University Case-Based Reasoning Framework (IUCBRF) [8]. We use RDF/RDFS as case representation language and integrate case data within the metadata of SERVOGrid resources. With our application, we provide a retrieval mechanism which interacts with user and where the system responds with ranked cases and ranked questions to distinguish them. The outline of this paper is as follows. First we introduce our effort in designing and developing ontology for SERVOGrid environment. Then, we talk about the details of our implementation of a CCBR retrieval mechanism utilizing instances of an RDF ontology. At last, we finalize the paper with a conclusion.
2 SERVOGrid Ontology Design and Development In this section, we describe developing ontology for SERVOGrid environment stepby-step. In this attempt, we have used RDF/RDFS to describe various resources. Servo-Grid project contains a collection of codes, visualization tools, computing resources, and data sets distributed across the grids. We generate instances of a welldefined ontology to describe SERVOGrid resources. A detailed version of our effort in designing this ontology is available in [9]. After having built such instances of the ontology, one can pose queries on the ontology instances. We introduce a CCBR retrieval system querying ontology instances and recommending users ranked results and questions to distinguish them.
A - Defining SERVOGrid Ontology Classes When designing an ontology, we first need to group together related resources. In SERVOGrid project, there are three major groups of resources. These groups are
ServoCodes, ServoData, and ComputingPlatforms. Table 1 shows the high level classification of classes to group together SERVO Grid resources as well as things that are related with these resources. Servo Grid Resources Classes ServoObjects ServoDataFormat ServoComputePlatform ServoCodeCharacteristics ServoObjectContainer Organization Person Location
Description describes the code and the data created as a result of an experiment or creativity describes the types of different data formats used in SERVO Grid Project describes the computing platforms exist in SERVO Grid Project describes the characteristics of the ServoCode, e.g. model, programming languages used in ServoCode describes the various types of containers for code, data and documents, e.g. HDF file describes the organizations that are involved in SERVO Grid Project describes the people working in the SERVO Grid Project describes the location of the organization involved in SERVO Grid Project
Table 1. High level classification of classes in Servo Grid Ontology In the SERVO Grid environment, first two groups of resources are SERVO Grid codes and their input/output data. These resources resemble each other as being SERVO Grid objects that are created as a result of an experiment or creativity. So, we defined a general class called “ServoObjects” to capture this similarity and made the first two groups of resources as subclasses of ServoObjects. SERVO Grid codes differ from each other, as they were written for different purposes. We defined three different classes for SERVO Grid codes. These three sub-classes are simulation codes, visualization codes, and application codes. Simulation codes are to simulate interacting fault systems with real or syntactically created data using different earthquake models. Examples of these codes are mesh generator, VC (Virtual California), simplex and disloc [1-3]. We associate simulation codes with zero or more visualization systems. The codes, written for those systems, are referred as visualization codes. Some examples of visualization codes are inSAR and VC [1-3]. Application codes can range from complex statistical analysis programs to simple text editors to facilitate the earthquake science. Likewise further classification can be done for the SERVOGrid data. At this classification level we don’t intent to represent the data itself, however, the information about the data, such as creator of the data, needs to be represented. To this end, data can be re-grouped as GPS data, Fault data, and Seismicity data. The second group of resources in the class hierarchy is “ServoDataFormat” to represent different data formats. Example instances of this class could be the Swath data type, Grid data type and Point data type. For lack of space, these are not given here but preliminary schemas are available from [10].
The third group of resources is the computing platforms. These resources can be further classified under “ComputeResources” and “InstalledWebServices” sub-classes. The descriptions of these sub-classes can be seen in Table 2. ServoComputePlatform Classes ComputeResources InstalledWebServices
Description describes the computers providing computational environment for SERVO Grid project describes the web services providing message oriented computing environment in SERVO Grid environment
Table 2. Classification of ServoComputePlatform. Fourth group of resources is “ServoCodeCharacteristics”. We classified these characteristics of the SERVO Grid codes as described in the following Table 3. Servo Code Characteristics Classes KnownProblemsInCompiling KnownSuccessfulLibrary UsedModel UsedProgrammingLanguage
Description describes the general problems encountered while compiling the ServoCodes describes the successful libraries that have been tested and used with ServoCodes describes various earth science models used in ServoCodes, e.g. Elastic, inverse, viscoelastic and etc. describes used programming languages in implementing ServoCodes
Table 3. Classification of Servo Code Characteristics. In addition to the classes explained above, we have also defined small ontologies on organization, people and location concepts involved in SERVO Grid project. Since the focus of this paper is on SERVO Grid resources we will not cover these ontology classes.
B- Defining SERVOGrid Ontology Properties In order to relate ontology classes with each other, we defined our own meaningful properties for SERVOGrid ontology. We benefit from existing ontologies such as Dublin Core [12] and vCard [13] as well. The ServoCode class defines seven different properties in addition to the Dublin Core definition standard elements. Likewise, ServoData and ServoComputePlatform classes define their own properties. The properties for ServoCode, ServoData and ServoComputePlatform and their meanings are described in Table-4, Table-5, and Table-6, respectively.
ServoCode Class Properties
Range
isDevelopedWith
UsedProgrammingLanguage
createsOutputData \ takesInputData
ServoData
dependsUpon
ServoCode
installedOn
ServoComputePlatform
developedBy
Person, Organization
isOwnedBy
Person, Organization
Description What programming language used to develop the code. what kind of data a code takes as input and generates as output a code depends upon another operation before it can be completed where a ServoCode is installed person or organization who developes the resource person or organization who owns the resource
Table-4. Associated properties with ServoCode Class ServoData Class Properties
Range
hasDataFortmatOf
ServoDataFormat
isInputDataFor \ isOutputDataFor
ServoCode
Description associate a data format with a piece of data what kind of code is using or taking this data as input/output. Inverse properties for createsOutputData and takesInputData
Table-5. Associated properties with ServoData Class ServoComputePlatform Class Properties
Range
isOwnedby
Person, Organization
hasData \ hasCode
ServoData \ ServoCode
isMaintainedBy
Person
Description describes the person or organization that owns the resource describes the ServoData or ServoCode accessed through this compute platform describes the person that maintains this compute platform
Table-6. Associated properties with ServoComputePlatform Class
C- Generating the Ontology instances with SW Languages Following the steps above, we have defined a class hierarchy associated with meaningful properties. After designing the ontology, we wrote the description of these classes and the properties in semantic markup languages RDF/RDFS. The SERVOGrid ontology and ontology instances are available from [10][11]. As a next step, we populated various metadata instances for resources which are available at [11]. These instances form a pool of meta-data that are spread out across the grids. Making the distributed metadata available for the users is a focus of another paper [9]. In this paper, we discuss the details of providing a CCBR recommender system to retrieve the requested metadata satisfying the user query. This is why; we will assume that our application has an access to all metadata pieces as if they are stored in a local file store.
3 A Web based CCBR Recommender System for Ontology aided Metadata Discovery We implemented a prototype web based CCBR system [14] in attempt to solve metadata discovery problem in SERVOGrid project. Our application utilizes an ontology aided semantic metadata representation using RDF/RDFS and responds to a query with ranked cases and questions to distinguish them by interacting with users. Similar to other conversational recommendation systems [15, 16] our implementation narrows down the results at each step of the interaction with user. Our methodology aims to improve precision in the search results by eliminating irrelevant cases if they conflict with problem description according to conversation so far. Our application uses IUCBRF [8] as a domain-independent CBR Engine. IUCBRF is an open source framework written in Java. It provides standard implementations of necessary components (such as domain definition, case construction, retrieval and so on) to develop CBR/CCBR applications. Since the IUCBRF does not depend on any domain, given an RDF/RDFS ontology and ontology instances our CCBR retrieval system can be used in any domain as well. The domain for our CCBR application consists of information about SERVOGrid resources. A problem description for a current situation is metadata of SERVOGrid resource and a solution to a current situation is a pointer to a resource described by metadata. A case contains both problem description and the solution of a situation. In our implementation, we particularly focused on discovery of earthquake modeling codes and data. For this reason, resources mentioned here are only SERVOGrid objects, however, the focus of our prototype can easily be extended to other resources of SERVOGrid project. To this end, Casebase of our CCBR system consists of RDF metadata about SERVOGrid objects. For case storage, we utilize “flat case base” general indexing scheme implemented in IUCBRF. Flat case base indexing scheme
simply stores all cases in an unordered list. CaseBase is initially generated from a file store where each case is represented with RDF/RDFS syntax.
Figure 1. A CBR case consists of a problem and a solution along with some other information such as time of creation, case source and etc… We define both problem solution parts of a case with RDF Triples as shown in the figure.
Figure 2. Cases are represented with RDF triples. A Query case is also constructed as a set of triples according to conversation so far between the system and user.
A CBR case contains a problem description of a current situation, solution to a current situation, time of case creation, inactive contexts, use counts and source information [8]. We focused on representing both problem and solution parts of a case with RDF/RDFS syntax. Both a problem and a solution consist of set features. The range of possible features is defined by a CBR Domain. In our approach, we defined a domain feature as pair of an RDF Triple. In the initialization phase, all possible pairs, derived from an ontology, form pre-defined features of the CCBR Domain with following exception. We only take into consideration RDF Triples where the subject is an instance of “SERVOGrid Object” class or its subclasses in the high level class hierarchy of SERVOGrid ontology. We implement an RDF Triple feature that implements Feature Interface defined by IUCBR Framework. To this end, we define feature sets of both problem and solution
of a case with RDF Triples as shown in Figure 1. Likewise, a query case, constructed by user-system interaction according to conversation so far, does also contain RDF Triples as sets of features as shown in Figure 2. At each step of the retrieval, ranked cases and questions to distinguish them are displayed. Our system utilizes a “threshold retrieval method” implemented in IUCBRF. In this method, all cases within a given threshold of difference are retrieved. Cases get eliminated if they are in conflict with problem description according to the conversation so far. When refining a case list, for each case under consideration, each known feature (an RDF triple) is checked against the problem' s corresponding features. If the feature is unknown for the problem, no action is taken against or in favor of the case under consideration. If the feature is known, then the value of this feature (of case under consideration) is compared against the value of that feature of the problem. If both values are consistent, then there is a consistent feature in the case and case gets higher ranking in the list. If the value is inconsistent, there is an inconsistent feature in the case and then case gets eliminated. In this methodology, cases are ordered by the number of consistent features. Cases that have inconsistent features are eliminated. The system determines which question should be displayed to the user depending on what is known about the problem so far, what cases are under consideration, and the domain. Questions are driven from pairs. Our system decides which question to be displayed next based on question frequency in cases under consideration. A feature which is most frequently known by the cases under investigation gets the highest rank. The question which is driven from this feature is the question to be asked next. Only features that exist within the cases under investigation can qualify to be ranked and the rest of the domain features would be eliminated from question list.
Figure 3. User interface of our CCBR retrieval system. In this facility, a user can interact with the system and the system responds with ranked cases and questions to distinguish them.
Figure 3 shows the user interface of the main retrieval facility of our CCBR retrieval system. This facility provides interaction between user and the system. This user interface provides the feature list where the user can change the order of the questions by clicking the question identifier on the left. The most frequent feature is the feature from which the question to be asked next is derived from. Each feature has a hyperlink pointing to SERVOGrid ontology to let the user learn about the description of the predicates. This user interface shows also a list of cases that are still under investigation. Case titles have hyperlinks pointing to RDF representation of the cases. One can display the details of a case by clicking the case identifier on the left. We also provide facilities to manipulate cases. To this end, we provide add, delete and update case facilities as shown in Figure 4, below.
Figure 4. User interfaces for standard operations for manipulations of CBR cases such as add, update and delete operations.
4 Conclusions and Future Work We introduced our effort in designing and developing ontologies for an earthquake simulation grid such as SERVOGrid project. We introduced a web based CCBR retrieval system which operates on a RDF file store. We explained the details of developing such CCBR retrieval system which provides metadata discovery for SERVOGrid resources by interacting with users. In the future work, we will spend effort in providing standard way of representing CBR cases in SW languages. We will also expand our application to provide further guidance on the usage of complex earthquake modeling codes.
References [1] “Building Problem Solving Environments with Application Web Service Toolkits”, Choonhan Youn, Marlon Pierce, and Geoffrey Fox, ICCS03 Australia June 2003 [2] “Interoperable Web Services for Computational Portals”, Marlon Pierce, Choonhan Youn, Ozgur Balsoy, Geoffrey Fox, Steve Mock, and Kurt Mueller. SC02 November 2002 [3] “Interacting Data Services for Distributed Earthquake Modeling”, Marlon Pierce, Choonhan Youn, and Geoffrey Fox, ACES Workshop at ICCS June 2003 Australia [4] Gardigen D., Watson I (1998). A web based Case-Based Reasoning System for HVAC Sales Support. Proceedings of British Expert Systems Conference 1998. [5] Hayes, C., Cunningham, P., & Doyle, M., Distributed CBR using XML. In Proceedings of the KI-98 Workshop on Intelligent Systems and Electronic Commerce, number LSA-98-03E. University of Kaiserslauten Computer Science Department, 1998 [6] W3C Semantic Web Site: http://www.w3.org/2001/sw [7] W3C - Resource Description Framework Site: http://www.w3.org/RDF/ [8] Steve Boegarts, David Leake, “A Framework For Rapid and Modular Case-Based Reason ing System Development”, http://www.cs.indiana.edu/~sbogaert/CBR/IUCBRF.pdf [9] Mehmet S. Aktas, Marlon Pierce, and Geoffrey C. Fox, “Designing Ontologies and Dis tributed Resource Discovery Services for an Earthquake Simulation Grid” GGF-11 Global Grid Forum Semantic Grid Applications Workshop Hawaii June 10 2004 [10] Link to the SERVOGrid ontologies http://ripvanwinkle.ucs.indiana.edu:4780/examples/download/schema [11] Link to the SERVO ontology instances http://ripvanwinkle.ucs.indiana.edu:4780/examples/download/data [12] Dublin Core Meta-data Initiative Web Site http://dublincore.org/documents/dces [13] The vCard version 3.0 Specifications: http://www.ietf.org/rfc/rfc2426.txt [14] Link to the web site of CCBR project presented here. http://complexity.ucs.indiana.edu/~maktas/servo/project.html [15] Goker, M., Thompson, C.: Personalized conversational case-based recommendation Proceedings of the Fifth European Workshop on Case Based Reasoning. Trento, Italy [16] Aha D., Breslow L., “Refining Conversational Case Libraries”, in Leake D., Plaza E. (eds.) “Case-Based Reasoning Research and Development, 2nd International Conference on Case-Based Reasoning ICCBR 1997’, pp.267-268, Springer Verlag, Berlin 1997. [17] David C. Wilson, David B. Leake, Randall Bramley
Case-Based Recommender Components for Scientific Problem-Solving Environments Proceedings of the Sixteenth IMACS World Congress, 2000. http://www.cs.indiana.edu/~sbogaert/CBR/IUCBRF.pdf