A novel approach towards an integration of ... - Semantic Scholar

A novel approach towards an integration of multiple knowledge sources Jacques Calmet Barbara Messing y Joachim Schu

Abstract This work is to be considered as an extension of the integration mediator [20] approach to heterogenous databases. Former approaches towards an integration of dierent knowledge sources through a common interchange format, KIF [14] seem to be dead [10] because of the unability to incorporate future developments in knowledge representation. We would like therefore to perform the integration of already existing knowledge sources, such as databases (relational and object-oriented), knowledge-based systems, spreadsheets and statistical programs by the use of a deductive database based upon annotated logic [16, 18, 19] and constraint programming. The use of annotated logic allows a proper integration of inconsistent, temporal and uncertain knowledge, the use of multiple constraint domains enables the amalgamation of semantically dierent knowledge representations. Annotated logic furthermore enables the use of methods for performance improvement from classical logic such as magic sets since annotated logic resolution calculus is closely related to ordinary resolution calculus. Keywords: integration of multiple knowledge sources, annotated logics, distributed AI

Institut fur Algorithmen und Kognitive Systeme, Universitat Karlsruhe, Postfach 6980, 76128 Karlsruhe, Tel.: +49/721 608-4208, fcalmet,[email protected] y Institut f ur Angewandte Informatik und Formale Beschreibungsverfahren, Universitat Karlsruhe, Tel.: +49/721 608-3924 [email protected]

1

Introduction

To solve a problem in a distributed manner is either required or recommended in many applications for a variety of purposes. For instance performance can be improved, solutions can be accepted with more con dence and can be found although an agent may fail [5]. The question how to recognize and manage contradictory information provided by dierent databases, expert systems, humans or sensors which could belong to dierent independent agents, is one of the basic questions of Distributed AI. Coordination of con icting information requires establishing priorities or strategies from a global point of view. The main problem of coordinating these multiple sources of knowledge which are independently developed and mostly pre-existing is to nd a method for a proper integration of these dierent knowledge sources that is based upon a semantically well de ned framework. From a logical point of view, programs that contain a sentence and its negation are meaningless because anything can then be inferred. Usual deductive databases are restricted to de nite clauses and therefore do not allow representation of A^:A, but in the area of distributed and cooperating agents such contradictions are customary and con icts concerning one issue should on one hand be represented and, on the other hand, must not destroy agreements in other issues. In this paper we describe how both tolerating con icts and clear semantics can be handled within logic programming. Our approach makes it possible to handle not only contradictory information but also simple temporal, uncertain and incomplete knowledge. We will therefore use extensions of classical deductive databases. One extension is based upon a special kind of multivalued logics, annotated logics, rst introduced by V.S. Subrahmanian [16, 18, 19]. The advantage of annotated logics is that it makes possible to specify explicitly how to deal with contradictory information instead of computing maximal consistent subsets. Dealing with dierent structures of the set of truth values, we have a very exible instrument to integrate divergent informations that may also be temporal, uncertain or vague. Through the use of multiple domain constraints we are furthermore able to amalgamate schematic dierent knowledge representations and to de ne a suitable negation operator more ecient than the stable semantics of ordinary deductive databases. An already implemented multivalued hybrid knowledge representation system MANTRA [4] provides already some of the necessary capabilities. We are currently extending it by a component that uses annotated logics. Up to now, hybrid knowledge representation has only considered hybrid inference within frames, semantic nets, rules, and rst-order logic, but our framework enables ecient reasoning within dierent knowledge sources. Reasoning within such heterogeneous knowledge sources has been tackled by researchers in the areas of deductive databases and distributed arti cial intelligence. There doest not exist a multi-agent testbed based upon multi-valued logics although it is recognized that such a framework is needed. Our work is a step in this direction. At this stage, the implementation is underway. This paper sketches some of the key ideas which make it possible. It is structured as follows. We rst give a very short introduction to database mediators and the theory of annotated logics. In section 3 we give an example of integrating dierent knowledge bases. Section 4 contains a large example from an insurance company and in section 5 we will discuss some strategies to combine divergent information from dierent knowledge sources. 2

Database mediators

Wiederhold [20] proposes a mediator approach to an integration of data from dierent heterogeneous databases: \A mediator is a program that helps integrating data or in some other way helps representing a higher level view to its applications". He identi es four dierent mediators : integration mediators, domain model mediators, monitor mediators and local mediators. Figure 1 illustrates the extended mediator architecture along the ideas of [9, 20]. There are dierent kinds of heterogeneity, semantic heterogeneity such as dierent names for the same entity and the use of dierent database systems on dierent hardware. Fahl [9] is currently working on an integration mediator supporting declarative queries based upon OSQL (object-oriented SQL), which is object-oriented and functional. Other researchers such as Krishnamurty [17] state that Horn clause logic is better suited to schema integration of relational databases

than relational languages. Our approach is based on a logical framework as introduced by Subrahmanian [19], who suggested the use annotated logic and multi-domain constraints as a mean to the integration of schematic dierent knowledge sources. We could express temporal, inconsistent and uncertain knowledge in uni ed way through the use of annotated logic. Multi-domain constraints oer reasoning with dierent domains of knowledge stored in heterogenous databases in such an environment. On one hand we have semantical richer language than former languages to express dierent schemas from dierent knowledge sources in a common data model. On the other hand we could reason about already existing knowledge from dierent representation systems as relational and objectoriented databases which are not amenable by a transformation into a common data model.

Application

Application

Integrationmediator

DB

XPS

Conventional Software

Figure 1: A mediator architecture 3

Introduction to annotated logics

We give here a very brief and informal introduction to annotated logics which is the theoretical background of our approach. For details the reader is referred to [16].

De nition 3.1 A constrained annotated clause is : A:

Multiple Domain Constraints k

B1 : 1 ^ : : : ^ Bk : k ^ not(Bk+1 : k+1 ) : : : ^ not(Bk+n : k+1 )

The i 's are called annotations and are evaluable functions over an lattice1 or simply elements of the lattice. For example the lattice [0; 1] could be used for reasoning with uncertain information in the same way as in possibilistic-logic [7] and fuzzy-logic. Other lattices, particularly useful for reasoning with inconsistent information or time, can be found in the literature [15]. The expressiveness and eciency of such rules are enormous, since annotations could consist of several arguments from dierent lattices. Another feature is that bilattice-based logic programming is subsumed by annotated logic programming. This result is reported in [15]. This emphasizes the generality of annotated logic programming. The use of multi-domain constraints allows besides the features mentioned in 2 the de nition of a negation operator. Negation will tracted by solving a constraint over the domain of possible ground substitutions.

De nition 3.2 An annotated logic interpretation I is a map I : ground atoms ! T from the herbrand

base onto a lattice. An annotated atom A : is satis ed by I i I (A) with A is a strictly ground instance. A strictly ground instance is a ground instance as in classical rst-order logic and furthermore all annotation variables and functions are evaluated to constants of the lattice.

There is a xpoint characterization of annotated logic programs according to the above de nition of annotated logic interpretations. If there emerge more than one of the same ground atoms with dierent annotations during the resolution based inference of annotated logic programs, they are combined into one ground atom by taking the least upper bound of the annotations. For example A : true; A : false are combined to A : > where > denotes a logical inconsistency, A : 0:7; A : 0; 8 are combined to A : 0:8. The body of a constrained annotated clause is satis ed if all literals are true and furthermore all multi-domain constraints are satis ed. Some simple samples of annotated clauses are as follows :

It is sucient to think of a partial ordered set as a lattice to understand the following lines. Indeed every lattice is a partial ordered set but not vice versa. 1

Simple temporal information: "On all days where US-Dollar is low buy HongKong Dollars on the following day" could be expressed by following rule : Buy HK Dollar : succ(T )

US Dollar low : T

succ() denotes a function which increases every member of the set T of moments (e.g. days).

Suppose there are two dierent relational databases we would like to reason about. We would further-

more like to reason with uncertainty and temporal information restricted by numerical constraints. We are now using two-dimensional annotations and numerical constraints to show the full expressiveness of such rules. The two database relations are Ordered Articles < Customer; Article; Amount >,

Articles on stock < Article; Amount >

The con ict solution rule is: \If there are more articles ordered by a very important customer than currently available and present production is not very high then there is a chance of 75% to ful l the order".

fulfil Order(X ) : [0:75; T ] Ordered Articles(X; Y; Z )^ Articles on Stock(Y; Z 2)^ (Z 2 < Z ) k not(production high : [0:5; previous(T )])^ Important(Customer) : [0:9; previous(T )] Particularly the last example should have shown that our framework allows a proper integration of dierent knowledge sources based upon a clear model-theoretic semantic. 4

An example

Imagine an insurance company. To select the tari for a person's disability insurance, there are many criteria which must be considered carefully. To estimate the risk that disables someone to work the company needs a strategy to combine dierent facts such as age, profession, or diseases. We consider dierent knowledge bases that contain facts and some rules of risk analysis. These knowledge bases might result from investigations of dierent experts or sta members. The annotations of the single knowledge bases dier in their domain, i.e. the lattice they stem from. This depends on the kind of knowledge we deal with. The supervisory knowledge base integrates these informations and evaluates them to assess the tari for a person's disability insurance. We allow also that the supervisor's answer is "unknown", this should mean that some more information must be required to make a decision. We don't claim this example to be realistic but it will show how amalgamating knowledge bases works using annotated logics. We denote the single knowledge bases by KB1 ,...,KB3, the supervisory knowledge base by KBs . KB1 contains information about age, sex and the marital status of a person. Because the age of a person depends on the current date, we use the lattice T 1 := I + [0; 1]. I + stands for quarters of years. KB1 might contain rules as: ?

) age(P ) : (n; X ): age factor risk(P ) : (n; e? X personal risk(P ) : (n; minfx1 ? 1 + 4 + X4 ; 1g) age factor risk(P ) : (n; X1)^ female(P ) : (n; X2) ^ married(P ) : (n; X3). (x 30)2 100

2

3

The rst of these rules expresses that the age-factor risk of a person increases until he/she has reached the age of thirty and decreases in the following years (remark this is only an example!). The second rule matches the personal risk from the age-factor risk and the facts of being female (some people think that this increases the risk) and being married (diminishes the risk). KB2 includes the risk depending on a person's profession. Here we could nd the following:

professional risk(P ) : high musician(P ) : 1: professional risk(P ) : high pilot(P ) : 1: professional risk(P ) : low computer scientist(P ) : 1: personal risk(P ) : X + professional risk(P ) : X ^ smoker(P ) : 1: personal risk(P ) : X + professional risk(P ) : X ^ climber(P ) : 1: personal risk(P ) : X ? professional risk(P ) : X ^ vegetarian(P ) : 1:

The risk for musicians is high because of their precious ngers; and pilots live dangerous. Computer scientists have a low risk. Smoking (unhealthy) and climbing (dangerous) raise the risk while being vegetarian diminishes it (vegetarians take more care of their health). The lattice T 2 consists of the elements 0, very low, low,unknown, high, very high, and 1. The order is given in this sequence. X + denotes the successor in this sequence, where 1+ = 1. The analogous de nition holds for X ? . KB3 analyzes the diseases a person has suered from up to now. Each disease is signed with a risk factor, that is denoted by S . The longer the person is healthy again, the better for his/her tari. The lattice here is T 3 := I + [0; 1] and the knowledge base might include: personal risk (P ) : (n; f (; n; S )) disease(P ) : (; S ), where

f (; n; S ) =

(

S

:

S ln(n?)

= n_?n = 1 otherwise

The function f here expresses that the risk is high if a disease has occured in the current ( = n) or the period before ( = n ? 1), and the risk factor of a disease diminishes in course of time (otherwise). The supervisory knowledge base KBs integrates the statements of the previous knowledge bases. The rules have the form personal risk(P ) : (fsg; :::) ::: where in the body of the clause the statements of the single knowledge bases appear. Furthermore they are previously identi ed by indices denoting the knowledge base they come from, e.g. ?

age factor risk(P ) : (n; e? ) age(P ) : (n; X ) is replaced by x? ? ) age(P ) : (f1g; n; X ), age factor risk(P ) : (f1g; n; e where 1 is the identi er. We choose T s := I + [0,1]. The following two rules may for instance be in this (x 30)2 100 (

30)2 100

knowledge base:

personal risk(P ) : (fsg; (n; maxfX1; X32g)) X32 > 0:4^ personal risk(P ) : (f1g; (n; X1))^ personal risk(P ) : (f2g; high) ^ personal risk(P ) : (f3g; (X31; X32)) personal risk(P ) : (fsg; (n; minfX1; X32g)) personal risk(P ) : (f1g; (n; X1))^ personal risk(P ) : (f2g; low) ^ personal risk(P ) : (f3g; (X31; X32))

These rules "match" the risk-factors that arise from the dierent viewpoints (i.e. our dierent knowledge bases). So, if the risk factor of the disease is greater than 0.4 and the personal risk resulting from one's profession is high, the personal risk factor of the third knowledge base is taken, except if age factor is even higher (that is the content of the rst rule). If the "professional" ris is low, the supervisory knowledge base evaluates the minimum of the age-factor and the "health"-risk (second rule). Let \a" be a female, married and 26 years old person. Suppose she is a pilot, smokes, is vegetarian and suered from a disease of factor 0.6 four years ago. The current time is denoted by n0 . Our knowledge bases KB1 ,...,KB3 yield

personal risk(a) : (f1g; (n0; 0:85)) personal risk(a) : (f2g; high), and personal risk(a) : (f3g; (n0; 0:43)). In KBs , the rst rule res, so we get personal risk(P ) : (fsg; (n;0:85)). The second rule also res (notice that high low in T 2), so we also get personal risk(a) : (fsg; (n; 0:43)). The xpoint operator of annotated logics now takes the greater value, so the personal risk factor of a is 0.85.

5

Strategies on the supervisory level

On the supervisory level, strategies of managing divergent information may in uence. The strategies depend on the domain of application. In the area of nonmonotonic reasoning, there are many approaches to handle incomplete and inconsistent knowledge [3], especially using defaults and preferring some theories to other ones. These problems also appear in inheritance networks. But they can't be solved solely in a logical framework (just because divergent opinions are not a logical problem). There are no pat solutions to handle contradictions. Promising approaches have their origin in decision theory [8, 12]: They consider the behavior of groups and try to nd strategies to make plausible decisions in the case opinions diverge. For example, if dierent agents have divergent goals, the utility of goals to the single agents are evaluated. In [8], metrics between dierent plans of agents are de ned. In [2], the Dempster-Shafer Belief calculus is applied to the problem of divergent preferences in a group of agents: Dierent orders on a set are combined into one (this problem, in general, is NP-complete). Preference modeling by means of many-valued logics is already treated in [6]. Our approach using annotated logics is universal enough to incorporate dierent supervisory strategies. It is a focal point of our future research to provide mechanisms to use existing strategies. 6

Conclusions and directions for further work

This sketch of our work aimed to illustrate that the use of annotated logic oers a promising approach to problems involving both distributed and cooperative knowledge. This is its theoretical motivation. From a practical point of view, the rst step consists of selecting the relevant modules of MANTRA to transform it into a shell for distributed systems which runs on workstations. There is a long list of theoretical problems which arise from such a study, distributed possiblistic truth reason maintenance [7, 13], knowledge acquisition for distributed systems, extending the KADS framework, just to name some of them. However in relation to the short term availability of our system and its relevance for individual application the current research directions are query optimization within such heterogeneous knowledge sources and ecient caching of previously computed queries using OLDT2 technique [1]. Our work will then be applied in a marketing company for an integration of dierent large pre-existing relational databases with rule-based systems.

References [1] S. Adali, V.S. Subrahmanian. Amalgamating Knowledge Bases II: Algorithms, Data Structures, Query Processing. Preprint University of Maryland, College Park, 1993 [2] J.A. Barnett. Combining opinions about the order of rule execution. Proc. AAAI(1991),pp.477-481 [3] G.Brewka. Nonmonotonic Reasoning: Logical Foundations of Commonsense. Cambridge Tracts of Computer Science 12. [4] J. Calmet, I.A. Tjandra, G. Bittencourt, MANTRA: A Shell for Hybrid Knowledge Representation Proceedings of the Third IEEE International Conference on Tools for Arti cial Intelligence, San Jose, 1991, pp.164-171 [5] E. Durfee, V.Lesser, D.Corkill. Cooperation through communication in a distributed problem solving method. In: L.Gasse, M.Huhns: Distributed Arti cial Intelligence. Pitman, London, 1987, pp.29-58. [6] P.Doherty, D.Driankov, A.Tsoukias. Partiality, Para-consistency and Preference Modeling: Preliminary Version. Research Report, University of Linkoping, August 1992. 2

The acronym means Ordered selection strategy with Linear resolution for De nite clauses with Tabling

[7] D.Dubois, J.Lang, H.Prade. Possibilistic Logic and Partially Inconsistent Knowledge Bases. Research Report, IRIT-90-2, February 1990, Toulouse [8] E.Ephrati, J. Rosenschein. Constraint intelligence action: Planning under the in uence of a master agent. Proc. AAAI (1992), pp.263-268. [9] G.Fahl. Integration of Heterogeneous Databases in a Mediator Architecture. Research Report, CAELAB, Department of Computer and Information Science, Linkoping University, September 1992 [10] M.L. Ginsberg. Knowledge Interchange Format: The KIF of Death. AI Magazine, Vol.12, No.3, 1991, pp.57-63 [11] M.L. Ginsberg. Multi-valued logics Proceedings AAAI-86, pp. 243-247 [12] P.Gmytrasiewicz, E.Durfee, D.Wehe. A Decision-theoretic Approach to Coordination Multiagent Interaction. Proceedings IJCAI (1991), Vol.1, pp. 62-68 [13] M. N. Huhns and David Bridgeland. Multiagent Truth Maintenance. IEEE Transactions on Systems, Man, and Cybernetics, 1991, Vol.21, No.6, pp.1437-1445 [14] M.R. Genesereth. A KIF proposal . Technical report, Stanford University, 1990 [15] M. Kifer, V.S. Subrahmanian. Theory of Generalized Annotated Logic Programming and its Applications Journal of Logic Programming 1992, No.12, pp. 335-367 [16] M. Kifer, V.S. Subrahmanian. Theory of Generalized Annotated Logic Programming and its Applications Journal of Logic Programming 1992, No.12, pp. 335-367 [17] R. Krishnamurty, W. Litwin, W. Kent. Language Features for Interoperability of Databases with Schematic Discrepancies, Proc. ACM SIGMOD 1991 [18] V.S. Subrahmanian. Amalgamating Knowledge Bases. Preprint 1993, To appear in: ACM Transactions on Database Systems [19] J. Lu, A. Nerode, J. Remmel, V.S. Subrahmanian. Towards a Theory of Hybrid Knowledge Bases Preprint 1993 [20] G. Wiederhold. Building Adaptive Applications using Active Mediators. Proc. Database and Expert System Applications, DEXA 91, Springer-Verlag, 1991, pp. 508-513