-1-
SEMANTIC DISTANCE IN CONCEPTUAL GRAPHS Norman Foo1 , Brian J. Garner2 , Anand Rao3 and Eric Tsui4 Abstract A modification of Sowa’s metric on conceptual graphs is proposed and defended. The metric is computed by locating the least subtype which subsumes the two given types, and adding the distance from each given type to the subsuming type. Implementations using this metric are described, the relevance of it to fuzzy problems is explained.
1. Proposed Metric Given two concepts C1 and C2 with types T1 and T2, Garner and Tsui (1987) have proposed a modification of Sowa’s semantic distance between C1 and C2 as follows. Find the concept C3 which generalizes C1 and C2 with type T3 such that T3 is the most specific type which subsumes T1 and T2; the semantic distance between C1 and C2 is the sum of the distances from C1 to C3 and C2 to C3. It should be clear that this is indeed a metric, satisfying reflexivity, symmetry and the triangle inequality. In this paper we explain this definition in several ways, describe its use in an extensive implementation, and suggest how it may help solve problems in fuzzy concepts.
2. Implementation and Explanations Garner and Tsui (1987) first used this metric in an important component of the conceptual graph systems that were designed in Deakin University. They have applied this semantic distance to study a memory model that can — store graphs without predefining a fixed set of attributes (ie. labels);
_______________ 1. Computer Science Dept., University of Sydney, Sydney, Australia 2006. e-mail:
[email protected] 2. Dept. of Computing and Mathematics, Deakin University, Geelong, Australia 3217. e-mail:
[email protected] 3. Australian Artificial Intelligence Institute, 1 Grattan St., Carlton, Australia 3053. e-mail:
[email protected] 4. Expert Systems Group, Continuum Australia, 100 Mount Street, North Sydney, Australia 2060. e-mail:
[email protected]
-2-
— index the incoming structures in multiple locations of the memory (knowledge base), if necessary; — generalise between incoming structure and existing (indexed) structures; — identify graph-subgraph relationships; — handle exact and partial matching of stored structures based on an input pattern; — store all structures economically; and — propose new structures (ie. rules, graphs, concepts) by integrating (joining) a generalised pattern with another pattern. The success of the metric led naturally to an investigation of its philosophical plausibility and this yielded a number of explanations. The explanations come from a variety of disciplines. The first is anti-unification, which has been investigated by many workers in computational logic. In this, we are given two terms and it is desired to find a most specific term which specializes to the two terms. In the notation above, replacing "term" by "concept", the concept C3 is in fact the anti-unifier of C1 and C2 when the appropriate conventions are established for what is meant by unification in the framework of types. The conventions needed are precisely those given in Rao and Foo (1987) where a substitution is a finite set of the form {v 1 / t 1 , . . . , v n / t n } with each t i is a specialisation of the concept v i . In explanation-based learning (Mitchell et.al. 1986) we have the notion of abduction and generalization. Given instances of objects it is desired to find the best generalization or explanation for them. Denoting the objects by O1 and O2 (more objects follow the same reasoning) a good generalization is an object O3 such that O3 subsumes O1 and O2, and it does so in the most economical way. We will argue that the proposed distance measure satisfies this criterion. If O and O’ are both generalizers of O1 and O2, and O is a subtype of O’ then the logical formula ∀X ( O ( X ) →O′( X ) ) holds. Hence the hypothesis O is stronger then O’, or equivalently O is more economical since a stronger hypothesis admits fewer models. Thus, among the generalizers of O1 and O2, we should pick their least member if one exists. In arguing thereafter that the distance between O1 and O2 should then be measured by
-3-
going through such an O3 we rely on the taxonomic inspiration which follows. Thirdly, in taxonomic classifications there are metrics which indicate the closeness of relationships between species which resemble the one suggested. There is no classification scheme which is satisfactory for all purposes. Sowa (personal communication) has drawn attention to the relevance of salience in determining distance between concepts. He cites a hypothetical discussion on some dog, and notes that in ordinary discourse the super-type likely to be the generalizer is "animal" rather than "vertebrate" even though the latter is a sub-type of the former. Context and salience can be viewed in our framework as sub-lattice selectors - they suppress some types and shrink the type lattice. Within any one context, however, the taxonomic precedents in biology are instructive. Perhaps the simplest demonstration of the metric suggested above is in the discipline of genetics. Amino acid sequences of different species are compared. "A direct measure of genetic distance is given by the number of amino acid replacements that have occurred since the taxa from which the proteins are obtained shared a common ancestor" (Purves and Orians 1983 - our italics). This technique has the merit of being objective in that it is not dependent on behavioral traits. Finally, we explain why the C3 above is the solution to some minimal extension for the analogs of the logical constraints C1 → C and C2 → C. In fact this is closely related to the abduction discussion above. Suppose we have the two logical expressions C 1 ( X ) →C ( X ) and C 2 ( X ) →C ( X ) and we want the smallest models of these for the predicate (type) C. The answer is given by McCarthy’s circumscription (V. Lifschitz 1985). In the established notation the solution for such a C is A(C) /\/\ ∀c¬( A ( c ) /\/\ c < C ) where the < sign is a comparison of predicate extension. If the generalizers of C1 and C2 have a least element, that is the unique solution to this circumscription.
3. Fuzzy Problems and An Example An interesting application of semantic distance is the following. Wittgenstein’s problem about games is often cited as an impediment to rule-based systems. The problem was intended to show that no fixed set of intensional descriptions of "game" will suffice. The concept "game", like most everyday concepts, seems to have flexibility in usage. There are two ways to handle this. In the first way we invoke belief revision to extend the denotation of terms when old denotations prove inadequate. The other is to
-4-
build schemas for concepts which admit other concepts which resemble them. Woods (1986) has argued strongly that it is this ability to talk about resemblance which empowers KR beyond mere logic. We will add to his argument the distance measure proposed above by exhibiting an example of how the measure can be biased in different directions to yield several templates "like" the original. A combination of both belief revision and distance measures is probably necessary. Consider the following partial hierarchy of schemas
MOBILE-ENTITY
VEHICLE
SEDAN
WAGON
TRUCK
which may be used to provide background information for language understanding programs. The system could, either as a result of its accumulated information (evidence) or be told explicitly, determine that a new class has to be introduced to differentiate the roles of a SEDAN/WAGON and a TRUCK. As a result, the above hierarchy would need to be revised/refined, the CAR schema introduced into the hierarchy.
-5-
MOBILE-ENTITY
VEHICLE
CAR
SEDAN
TRUCK
WAGON
Since the subsumer-subsumee relationship and the semantic distance between all pairs of adjacent entities must be strictly enforced at all times, the following properties must hold: i.
CAR < VEHICLE, CAR > SEDAN and CAR > WAGON; and
ii.
SD between SEDAN (WAGON) and VEHICLE = SD between CAR and VEHICLE + SD between CAR and SEDAN (WAGON)
The above two restrictions enforce constraint on the definition of CAR.
4. Knowledge Correlation Finally, we describe how semantic distance can be applied to correlate three types of knowledge in a Canonical Graph Model (Tsui, 1988). These three types of knowledge are, in an increasing order of sophistication, concepts, graphs and rules. For concepts, adjacent types in the type hierarchy are assigned a semantic distance of 1. The semantic distance between two types is defined as the sum of the semantic distance from each of the two types to their minimal common supertype (MCS). If UNIVERSAL is the MCS, then the semantic distance between these two types is assigned to infinity. In the Canonical Graph Model (CGM), semantic distance is applied to
-6-
i.
determine the relevancy between concepts in a default (pre-defined) graph with certain words captured from the input sentence (Garner, Lukose and Tsui, 1987) so as to adapt the default graphs to build an initial set of intermediate graph(s); and
ii.
restrict the level (degree) of generalisation of a concept label so that overgeneralisation of concepts (therefore graphs) does not occur.
The semantic distance between two graphs is defined as the sum of the semantic distance between corresponding pairs of concepts in the two graphs. If there is no matching pair of corresponding concepts, then the semantic distance for the graphs is undefined. Semantic distances between graphs serve to determine pairs of relevant graphs to be maximally joined and assist the semantic interpreter (SI) (loc. cit.) in the formation of intermediate graph(s) at each level of the parse tree. These intermediate graphs are propagated to the next level of the parse tree and semantic distances are again evaluated to determine which pairs of graphs should be (maximally) joined. For rules, the notion of a semantic distance is much more difficult and abstract to be defined. At this stage, we define rule subsumption as rule A subsumes rule B iff all the assertions (represented as graphs) in A subsumes corresponding assertions in B. There are at least three good reasons why a correlation measure between rules is useful: 1. During knowledge acquisition, rules captured by the system, whether from the same session or not, may subsume each other thereby leading to redundancy in the knowledge base. 2. An intelligent knowledge acquisition program should be able to generalise on input rules based on commonalities found in these rules. In other words, structural knowledge among rules are automatically discovered and made explicit. 3.
Both the generalised rule(s) and the actual rules can be applied during reasoning (deduction) leading to a wider applicability of the rule set and better explanations.
Multi-valued logics are being studied (Garner, 1989) for the formalisation of a correlation measure between abstractions.
5. References [1] GARNER, B,J, (1989); unpublished notes.
-7-
[2] GARNER, B.J., LUKOSE, D. and TSUI, E. (1987); Parsing Natural Language through Pattern Correlation and Modification, Proceedings of the 7th International Workshop on Expert Systems & Their Applications, Avignon ’87, 13-15th, May, 1987, France, p1285-1299. [3] V. LIFSCHITZ; Computing Circumscription: Proc. 9th IJCAI, 1985. [4]
PURVES and ORIANS; Life - the Science of Biology: Willard Grant Press, 1983, Boston, MA.
[5]
MITCHELL, T.M., KELLER, R. M. and KEDAR-CABELLI, S.T. (1986); Explanation-based Generalisation: A Unifying View, ML-TR-2, Computer Science Department, Rutgers University, New Jersey.
[6] RAO, A. and FOO, N. (1987); CONGRES - Conceptual Graph Reasoning System : Third Conference on Artificial Intelligence and Applications, (IEEE), Florida, 1987. [7] TSUI, E. (1988); Canonical Graph Models, Ph.D. thesis, Department of Computing and Mathematics, Deakin University, Australia, 1988. [8] WOODS, W.A. (1986); Important Issues in Knowledge Representation: Proc. IEEE, October 1986.