sure on semantic networks (SN) that uses both hierarchical and non-hierarchical ... content of all edges is uniform, which is a too strong hypothesis [11]. In this paper, we ... In addition, it uses the information-theoretic definition of semantic ...
Semantic relatedness in semantic networks Laurent Mazuel and Nicolas Sabouret1 Abstract. This paper presents a new semantic relatedness measure on semantic networks (SN) that uses both hierarchical and non-hierarchical relations. Our approach relies on two assumptions. Firstly, in a given SN, only a few numbers of paths can be considered as “semantically correct” and these paths obey to a given set of rules. Secondly, following a given edge in a path has a cost (which depends on its type, is-a, part-o f , etc.) and its position in the path. We propose an evaluation of our measure on WordNet with two different benchmarks, using the part-o f relation. We show that, in this context, our measure does better than the classical semantic measures.
1
Introduction
The great majority of existing Human-Machine interaction systems with Natural Language (questions/answering systems, dialogue systems...) makes use of ontologies for the semantic interpretation [2]. But in current NL systems, this ontology is reduced to a taxonomy and the semantic interpretation relies only on similarity measure, even if the literature underlines the need for semantic relatedness measure, and, thus, the use of heterogeneous relations types [2]. Semantic relatedness between two concepts can be materialised by a path, starting from one concept, following different kinds of relations (subsumption (is-a), meronymy (part-o f ) or any other domain specific relation) to the other concept. Therefore, computation of semantic relatedness is considered to be much more difficult than computing a similarity measure (i.e. measure which only use the subsomption links) [11]. For this reason, the much work on semantic measures has focused on computing similarity degrees [1]. On the other hand, no work, since work of Hirst & St-Onge (HSO) [5], has focused on the issue of semantic relatedness in a semantic network (SN). Nevertheless, this measure assumes that the informationcontent of all edges is uniform, which is a too strong hypothesis [11]. In this paper, we present a new semantic distance (the lower the score, the closer the concepts are) to measure the relatedness degree between two concepts of a SN. This measure considers a set of constraints to filter the paths which are not semantically correct (since this problem may arise since we consider different kinds of relations). In addition, it uses the information-theoretic definition of semantic similarity to weight the hierarchical edges in the graph [11] and suggests a new strategy to compute the weight of a nonhierarchical edge.
the more the weight will be. Therefore, semantic measure are defined using the notation IC(c) for the information content of the node c. One of the major measure is the Jiang & Conrath measure [6]. In this measure, each edge is linked to a weight and the semantic distance is computed by adding all the edge weights along the shortest path. The weight LS(x, y) (“LS for link strength”) of an edge {x, y} between the node x and the node y is computing regarding to their information content2 : LS(x, y) = |IC(x) − IC(y)| On the other hand, semantic relatedness measures consider several kinds of relations (and not only the hierarchical relation) as partO f , madeWith, etc. However, if the shortest path is unique in a hierarchy, many possible paths exist in a graph although most of them are not correct semantically [5]. For this reason, any relational measure must provide (implicitly or explicitly) a set of constraints to ensure that a path is semantically correct. For instance, HSO [5] associate a direction in Upward and Downward and Horizontal for each relation type and enumerate only 8 patterns of semantically-correct paths: {U, UD, UH, UHD, D, DH, HD, H}. Considering the WordNet relations, the authors define hypernymy and meronymy as Upward link, hyponymy and holonymy as Downward links and synonymy and antonymy as Horizontal links.
3
Our semantic relatedness measure
We call single-relation path (S-R path) a path whose edges are all of the same type X. To compute the weight W of a S-R path, we separate hierarchical relations (X is the is-a or the includes relation) and nonhierarchical relations. Let us consider a path pathX (x, y) between two concepts x and y in the ontology, following only the relation X. If X is a hierarchical relation, we chose to consider the Jiang & Conrath weight of an edge: W (pathX∈{isa,include} (x, y)) = |IC(x) − IC(y)|
Information-theoretic measures uses a weight for a node in a taxonomy. This weight represents the information content (IC) of the concept in the hierarchy [11, 12]. The more specialised a concept is,
If X is not a hierarchical relation, we cannot use the information content of nodes, because this value is computed regarding to the hierarchy structure [12, 11]. We propose to associate to each relation type X a static weight TCX , which corresponds to the “strength” of a given relation type. We then compute the weight of the path as a value that has the following properties: 1) it increases with the length of the path and 2) it is bounded by TCX which represents the worst possible value for an X-relation path (i.e. the value of an infinite-length path that uses only X relations). Information-theoretic measures [11, 1] have outlined the adequacy of the log function to compute a semantic weight, but the log function is not bounded. Then, we use the n/n+1 function to simulate a logarithmic bounded function.
1
2
2
Work background and notation
Laboratoire Informatique de Paris 6 - LIP6 104 av du Président Kennedy, 75016, Paris, France, email: {laurent.mazuel; nicolas.sabouret}@lip6.fr
The final J&C measure is defined as IC(c1 ) + IC(c2 ) − 2 × IC(ccp(c1 , c2 )), where ccp represent the common closest parent of the two nodes.
Some couples were not connected in WordNet and, thus, by our measure (e.g. “telephone-communication”). This can be explain, by the lack of relations types in WordNet. Moreover, the WS-353 test contains many couples connected by common-sense link and thus not connected in WordNet (e.g. “popcorn-movie”). For this reason, we believe that it will be very difficult to go beyond the 0.35 − 0.4 limit on the WS-353 test using only WordNet as an ontology.
As a consequence, the weight of pathX (x, y) when X is not a hierarchical relation, is defined by: |pathX (c1 , c2 )| W (pathX (x, y)) = TCX × |pathX (c1 , c2 )| + 1 Now, let us consider path(x, y), a mixed-relation path (M-R path) between two concepts x and y. It is clearly composed of an ordered set of n S-R sub-paths. We note T (path(x, y)) this unique ordered set of sub-paths. Hence, the weight of the M-R path path(x, y) is then defined as the sum of all sub-paths weights of T (path(x, y)): W (path(x, y)) =
∑
5
In this paper, we have presented a new measure to evaluate the semantic relatedness between two concepts. Our measure takes advantages of the two paradigms: the semantically correct path for semantic relatedness (based on the Hirst & St-Onge [5]) and the information-theoretic approach (introduced by [11]) to refine result. The evaluation underlines the lack of non-hierarchical relation in WordNet, as first mentioned in [5]. For instance, in WordNet, there is no relational path between concepts like “journey-car” or “telephonecommunication”. This allows us to conclude that to use the capabilities of a semantic relatedness measure, we need a real domain ontology. For this reason, our next aim is to test our measure in our natural language semantic interpretation algorithms in dialogue systems [8]. Our final objective is to propose and to evaluate a measure of semantic similarity for more complex language of knowledge representation [4]. We think it can be possible to extend the weighting strategy to model specific relations between concepts, as intersection or disjunctive classes.
W (p)
p∈T (path(x,y))
To compute this distance corresponding to the relatedness between two concepts, we consider only the semantically correct paths between these two concepts and we will select the best one. We chose to use the HSO rules to filter the semantically correct paths. Let us consider c1 and c2 , two concepts. We note π(c1 , c2 ) the set of acyclic paths between c1 and c2 and HSO : π(c1 , c2 ) −→ B the function such that HSO(p) is true if and only if p is a valid path w.r.t. HSO rules. The semantic distance between c1 and c2 is then defined as: dist(c1 , c2 ) =
4
min
{p∈π(c1 ,c2 )|HSO(p)=true}
W (p)
Evaluation
References
We chose two benchmarks: 1) the well-know Miller & Charles test [9], composed of 30 couples of words associated to a human similarity score and 2) the WordSimilarity-353 test3 [3], composed of 353 couples of words which are relationally connected (e.g. “computerkeyboard”, “telephone-communication”, etc.). In this evaluation, we will consider the noun sub-part of WordNet 3.0. Because of WordNet relation definition, we will consider only the part-o f relations for non-hierarchical relations. Moreover, we chose to consider only one fixed maximal weight TCX for the part-o f relations of WordNet.4 We compare our approach with 4 similarity measure (Rada [10], Resnik [11], Lin [7] and Jiang & Conrath [6]) and the Hirst & St-Onge [5] relatedness measure. Measures Rada Resnik Lin Jiang & Conrath Hirst & St-Onge Our measure, TCX = 0.4 Table 1.
[1] A. Budanitsky and G. Hirst, ‘Evaluating wordnet-based measures of semantic distance’, Computational Linguistics, 32(1), 13–47, (March 2006). [2] K. Eliasson, ‘Case-Based Techniques Used for Dialogue Understanding and Planning in a Human-Robot Dialogue System’, in Proc. of IJCAI07, pp. 1600–1605, (2007). [3] Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin, ‘Placing search in context: the concept revisited’, in WWW ’01: Proceedings of the 10th international conference on World Wide Web, pp. 406–414, New York, (2001). ACM Press. [4] J. Hau, W. Lee, and J. Darlington, ‘A Semantic Similarity Measure for Semantic Web Services’, in Proc. Workshop on Web Service Semantics, (2005). [5] G. Hirst and D. St-Onge, ‘Lexical chains as representation of context for the detection and correction malapropisms’, in WordNet: An Electronic Lexical Database, ed., Christiane Fellbaum, chapter 13, 305– 332, MIT Press, (1998). [6] J. Jiang and D. Conrath, ‘Semantic similarity based on corpus statistics and lexical taxonomy’, in Proc. on International Conference on Research in Computational Linguistics, pp. 19–33, Taiwan, (1997). [7] D. Lin, ‘An information-theoretic definition of similarity’, in Proc. 15th International Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco, CA, (1998). [8] L. Mazuel and N. Sabouret, ‘Generic command interpretation algorithms for conversational agents’, in Proc. Intelligent Agent Technology (IAT’06), pp. 146–153. IEEE Computer Society, (2006). [9] G.A. Miller and W.G. Charles, ‘Contextual correlates of semantic similarity’, Language and Cognitive Processes, 6(1), 1–28, (1991). [10] R. Rada, H. Mili, E. Bicknell, and M. Blettner, ‘Development and Application of a Metric on Semantic Nets’, IEEE Transactions on Systems, Man, and Cybernetics, 19(1), 17–30, (1989). [11] P. Resnik, ‘Using information content to evaluate semantic similarity in a taxonomy.’, in 14th International Joint Conference on Artificial Intelligence (IJCAI’05), pp. 448–453, (1995). [12] N. Seco, T. Veale, and J. Hayes, ‘An Intrinsic Information Content Metric for Semantic Similarity in WordNet’, in Proc. ECAI’2004, the 16th European Conference on Artificial Intelligence, pp. 1089–1090, (2004).
Correlation M&C WS-353 0.638 0.249 0.804 0.375 0.836 0.377 0.880 0.362 0.847 0.380 0.902 0.400
Comparison of correlation factors.
Table 1 shows that our measure obtains the best correlation regarding to the others measures. Moreover, to our knowledge, it is the first time that a semantic measure based on WordNet reaches a correlation of 0.4 for the WS-353 test. 3 4
Conclusion & future work
http://www.cs.technion.ac.il/~gabr/resources/data/ wordsim353/wordsim353.html Since we cannot anticipate the correct value for this TC part -o f , we evaluated different values and found that the best TCX in this context is TCX = 0.4.
2