Explanation Support for the Case-Based Reasoning Tool myCBR

7 downloads 127 Views 419KB Size Report
Abstract. Case-Based Reasoning, in short, is the process of solving new problems based on solutions of similar past problems, much like humans solve many ...
Explanation Support for the Case-Based Reasoning Tool myCBR Daniel Bahls and Thomas Roth-Berghofer German Research Center for Artificial Intelligence DFKI GmbH Trippstadter Straße 122, 67663 Kaiserslautern, Germany, and Department of Computer Science, University of Kaiserslautern, P.O. Box 3049, 67653 Kaiserslautern {daniel.bahls,thomas.roth-berghofer}@dfki.de

The Open-Source Tool myCBR

Abstract 1

myCBR is an open-source plug-in for the open-source ontology editor Prot´eg´e2 . It follows in the footsteps of the integrated, Smalltalk-based CBR shell CBR-Works (Schulz 1999) with its rich point-and-click user interface. We also implemented a basic export interface to exchange similarity measures with jColibri, a powerful Java-based CBR framework3 , which allows more complex reasoning. In Prot´eg´e users define classes and attributes in an objectoriented way. Prot´eg´e also manages instances of these classes, which we interpret as cases. So vocabulary and case base knowledge containers are already handled by Prot´eg´e.

Case-Based Reasoning, in short, is the process of solving new problems based on solutions of similar past problems, much like humans solve many problems. myCBR, an extension of the ontology editor Prot´eg´e, provides such similaritybased retrieval functionality. Moreover, the user is supported in modelling appropriate similarity measures by forward and backward explanations.

Case-Based Reasoning Case-Based Reasoning (CBR), according to (Aamodt & Plaza 1994), basically follows this pattern: One formulates a problem as a query case and the repository of already experienced problem and solution pairs (the case base) will be ordered by similarity to the given query. The most similar cases are used to generate the solution for the posed problem. After a solution is retrieved, the new case (consisting of the new problem and the retrieved solution) is stored in the case base. This new experience can be used in the next retrieval. The CBR system learns. CBR systems’ knowledge can be divided in four knowledge containers (Richter 1995): • Vocabulary This knowledge container is the basis for the three other containers. It defines attributes and classes for query and case descriptions. In object-oriented CBR systems the vocabulary consists of numerical, symbolic, plain text, and instance type attributes. • Case Base This is the collection of previously experienced cases (traditional view) or products. • Similarity Measure The degree of similarity between a query and a case is defined by metrics. Local similarity measures define similarities for each attribute. Global similarity measures, e.g., weighted sum, minimum, or maximum, aggregate the local similarity measures into one similarity value on each class level. • Adaptation Rules This container provides knowledge for adapting the solution of a case to fit the query. This is often realised with rules. Adaptation rules are outside the scope of this work. We concentrate on the support of similarity measure modelling.

Figure 1: Similarity measure editors of myCBR The myCBR plug-in adds several similarity measure editors, which can be applied to the classes and attributes of an ontology. Its retrieval engine finds similar cases for a specified query. Additionally, CSV files can be imported for which a simple similarity model is built automatically if none exists. A standalone retrieval engine allows for easy integration into other applications. Figure 1 shows a screenshot of some of the available editors. 1

http://mycbr-project.net http://protege.stanford.edu/ 3 http://gaia.fdi.ucm.es/projects/jcolibri/ 2

c 2007, Association for the Advancement of Artificial Copyright  Intelligence (www.aaai.org). All rights reserved.

1844

For symbolic attribute types similarity values for all possible attribute values are defined using a table. The columns are headed by the symbols defined in the ontology, as are the rows. The similarity value for a query q and a case c can now be found in row q at column c (see lower half of Figure 1). This works fine for a small set of symbols. Otherwise, the effort of setting up and maintaining such a table becomes cumbersome, since the amount of entries increases quadratically with the number of symbols. Therefore, another editing mode, based on taxonomies, is available (see right half of Figure 1). One can build up a tree structure upon the symbols where the distance between symbols indicate their similarity. To define similarity measures for numerical attributes, some simplification is helpful. In order to offer an easyto-use interface (see left half of Figure 1), we reduce the dimensionality of the similarity function sim(q, c) by either calculating the quotient q/c or the difference q − c. This value is now the parameter for a helper function h : D → [0, 1], which graph is editable by pointing and clicking. Note that the domain D has the range [min − max, max − min] for the difference mode, and it has the range [min/max, max/min] for the quotient mode, where min and max are the range limits of the numerical attribute. Obviously, the quotient mode is applicable for numerical attributes with a value range not containing zero. This is just a selection of available local similarity measures, for which explanation support exists.

ing. If the user wants to know why the case with the highest similarity for a certain attribute is not under the best five regarding total similarity, he or she should use this option and examine its remaining attributes for similarity to the query. Forward Explanations While developing a CBR system an important question is: Does the similarity measure lead to the appropriate cases for a given query? Forward explanations predict the behaviour of the system during modelling time and explains the interdependencies between similarity measure and case base. For this, a central explanation component analyses the case base and gathers some statistics. It caches the value distribution for each attribute to make it available for peripheral explanation components. The value distribution itself can already be quite helpful and may reveal parts of similarity measures that are in fact never used. Or they reveal the missing of a border case, which is important for exception treatment. But we want to go a little further. As soon as we have a history of submitted queries, we can obtain a distribution of (q, c)-pairs. So we can examine, which parts of the similarity measures are often used. Thus, we can find out, which parts of the similarity measures are of high or low relevance. Setting up a CBR system, there is no such history. But, assuming that the value distribution in the case base equals the value distribution of real queries, one can build up a distribution as described above. Although such an assumption may be critical, it still delivers useful information. CBR systems need to become easier to set-up and maintain. For that, in our view, the inner workings must be easier to comprehend. myCBR is intended to be an integrated, yet open experimentation platform for improved communication between a complex information system and its users.

Explanation Support in myCBR Explanations, in principle, are answers to questions. In this paper we concentrate on questions the knowledge engineer might have during similarity measure modelling (see (RothBerghofer 2004) for more details on explanation sources in CBR systems). myCBR provides two kinds of explanations: forward and backward explanations (Richter & RothBerghofer 2007). Forward explanations explain indirectly by showing different ways to optimise a given result. They open up possibilities for the exploratory use of a device or application. Backward explanations explain the result of a process and how it was obtained. Here, we provide a way to understand the results of myCBR’s similarity calculation (backward explanations) and to explore the case base contents (forward explanations).

Acknowledgements We thank our colleagues Armin Stahl and Andreas Rumpf for their help in realising this work, which has been funded partially by the federal state Rhineland-Palatinate, project ADIB (Adaptive Provision of Information).

References Aamodt, A., and Plaza, E. 1994. Case-Based Reasoning: Foundational issues, methodological variations, and system approaches. AI Communications 7(1):39–59. Richter, M. M., and Roth-Berghofer, T. 2007. Explanation, information and utility. Unpublished. Richter, M. M. 1995. The knowledge contained in similarity measures. Invited Talk at ICCBR’95, Sesimbra, Portugal. Roth-Berghofer, T. R. 2004. Explanations and CaseBased Reasoning: Foundational issues. In Funk, P., and Gonz´alez-Calero, P. A., eds., Advances in Case-Based Reasoning, 389–403. Springer-Verlag. Schulz, S. 1999. CBR-Works: A state-of-the-art shell for case-based application building. In Melis, E., ed., Proceedings of GWCBR’99, W¨urzburg, Germany, 166–175. University of W¨urzburg.

Backward Explanations After the vocabulary has been set up, some cases have been injected into the case base, and similarity measures have been defined, the CBR system is ready for retrieval. For a query the system delivers a ranking of the case base. But since the model may be complex, the retrieval result may be quite surprising and need some explanation. To increase transparency, myCBR creates an explanation object for each case during the retrieval. This tree-like data structure stores global and local similarity values as comments for each attribute. These retrieval details are presented to the user by tool tips. Another valuable feature is the option to find the first most similar cases with respect to a single attribute in this rank-

1845