A comparative study between possibilistic and probabilistic ...

A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation Bilel Elayeb, Ibrahim Bounhas, Oussama Ben Khiroun, Fabrice Evrard & Narjès Bellamine Ben Saoud Knowledge and Information Systems An International Journal ISSN 0219-1377 Knowl Inf Syst DOI 10.1007/s10115-014-0753-z

1 23

Your article is protected by copyright and all rights are held exclusively by SpringerVerlag London. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.

1 23

Author's personal copy Knowl Inf Syst DOI 10.1007/s10115-014-0753-z REGULAR PAPER

A comparative study between possibilistic and probabilistic approaches for monolingual word sense disambiguation Bilel Elayeb · Ibrahim Bounhas · Oussama Ben Khiroun · Fabrice Evrard · Narjès Bellamine Ben Saoud

Received: 13 September 2013 / Revised: 20 January 2014 / Accepted: 3 May 2014 © Springer-Verlag London 2014

Abstract This paper proposes and assesses a new possibilistic approach for automatic monolingual word sense disambiguation (WSD). In fact, in spite of their advantages, the traditional dictionaries suffer from the lack of accurate information useful for WSD. Moreover, there exists a lack of high-coverage semantically labeled corpora on which methods of learning could be trained. For these multiple reasons, it became important to use a semantic dictionary of contexts (SDC) ensuring the machine learning in a semantic platform of WSD. Our approach combines traditional dictionaries and labeled corpora to build a SDC and identify the sense of a word by using a possibilistic matching model. Besides, we present and evaluate a second new probabilistic approach for automatic monolingual WSD. This approach uses and extends an existing probabilistic semantic distance to compute similarities between words by exploiting a semantic graph of a traditional dictionary and the SDC. To assess and compare these two approaches, we performed experiments on the standard ROMANSEVAL test collection and we compared our results to some existing French monolingual WSD systems. Experiments showed an encouraging improvement in terms of disambiguation rates

B. Elayeb · O. Ben Khiroun · N. Bellamine Ben Saoud RIADI Research Laboratory, ENSI Manouba University, 2010 Manouba, Tunisia e-mail: [email protected] N. Bellamine Ben Saoud e-mail: [email protected] B. Elayeb (B) Emirates College of Technology, P.O. Box: 41009, Abu Dhabi, UAE e-mail: [email protected] I. Bounhas LISI Research Laboratory, ISD Manouba University, 2010 Manouba, Tunisia e-mail: [email protected] F. Evrard Informatics Research Institute of Toulouse (IRIT), 02 Rue Camichel, 31071 Toulouse, France e-mail: [email protected]

123

Author's personal copy B. Elayeb et al.

of French words. These results reveal the contribution of possibility theory as a mean to treat imprecision in information systems. Keywords Word sense disambiguation · Possibility theory · Probability theory · Semantic dictionary of contexts · Semantic graph

1 Introduction The requirements of the automatic (or computer-assisted) translation and the nature of information retrieval (IR) systems being based on key words obliged researchers to develop tools for natural language understanding. One of the main characteristics of language is that a word, an expression, or a sentence may have many different senses. Some authors [82,104] distinguish several types of ambiguities such as polysemy and homonymy, but we generally consider words that have the same spelling and different senses, whatever is the degree of closeness of these senses. Such cases may bias the results of any natural language-based system. Then, it is necessary to identify, in a preliminary step, the exact sense of polysemous words using a technique called automatic word sense disambiguation (WSD). It is defined as the ability to identify the meaning of words in context in a computational manner [78]. This task is important in many fields such as optical character recognition (OCR), lexicography, speech recognition, natural language comprehension, accent restoration, content analysis, content categorization, IR, and computer-aided translation [30,61,115,116]. The problem of semantic disambiguation was dealt in many research works in various manners. However, it has been always regarded as a difficult task in the field of Natural Language Processing (NLP) as it requires enormous lexical resources such as labeled corpora, dictionaries, semantic networks, and/or ontologies [78,82]. Unfortunately, there is not yet a sufficient solution that meets the needs of a reader when he is faced with problems of ambiguity in IR or automatic translation tasks. Indeed, the main idea on which were based many researches in this field is that the fine relations between an occurrence of a word and its context will be maximized by the most probable sense of this occurrence [78,120]. Existing approaches, results, and discussions help constitute an outline for the problem of semantic disambiguation and plan the next tasks that can be carried out to improve this research. Within this framework, we seek to improve semantic treatment systems by proposing new models and methods. In this paper, we propose to use possibilistic networks on the one hand and semantic probabilistic graphs on the other hand as a means to represent the sense for automatic disambiguation. In fact, many types of information may be represented thanks to graphs and their edges such as relations of synonymy, antonymy, and hypernymy. Consequently, the study of the relations existing between the entries of a dictionary may be reduced to a study of a graph aiming to exploit networks of words. The majority of works in WSD are based on traditional dictionaries or other resources such as WordNet [15], which is not very different in terms of sense organization. The problem is that traditional dictionaries were conceived for human usage rather than automatic treatment. They lack of precise information useful for disambiguation. Consequently, one of the main difficulties in WSD is the inadequacy of the traditional dictionaries. The other difficulty consists in the lack of semantically labeled corpora useful for the learning step [6]. Even if these corpora are available, the existence of noise and the dispersion of knowledge required for disambiguation make this task hard. For these reasons, it is necessary to define new types of structures that may be trained and then used to represent knowledge useful for WSD. A semantic contextual graph is learnt and updated during the disambiguation process.

123

Author's personal copy Monolingual word sense disambiguation

The learning mechanism should be able to acquire many types of semantic links between a polysemous word and the definitions of a traditional dictionary, which contributed in its disambiguation. The semantic dictionary of contexts (SDC) is based on this idea by ensuring the machine learning in a semantic platform of WSD. Thus, we combine knowledge extracted from traditional dictionaries with contextual dependencies learned from a corpus. In fact, WSD approaches need training and matching models, which compute the similarities (or the relevance) between senses and contexts. Existing models for WSD are based on poor, uncertain, and imprecise data and use probabilistic training and matching models (e.g., Loupy [74], Nguyen and Ock [82], Yuret and Yatbaz [117]). In contrast, possibility theory is naturally designed for this kind of applications, because it makes it possible to express ignorance and to take account of the imprecision and uncertainty at the same time. For example, a recent work of Ayed et al. [9–12], which have proposed possibilistic approach for the automatic morphological disambiguation of Arabic texts, showed the contribution of possibilistic models compared to probabilistic ones. Our work in this paper is the first attempt to apply this theory for WSD especially for French language, that is, we evaluate the relevance of a word sense given a polysemous sentence proposing two types of relevance: plausible relevance and necessary relevance. On the other hand, the problem of disambiguation should be modeled in a dynamic perspective. The dynamic calculus of the sense in a semantic space consists in specifying constraints on each point of this space. It allows obtaining semantic relations between words. From these relations, we can compute the semantic distance between a polysemous word and its definitions mentioned in a traditional dictionary given contextual information. Therefore, we propose, evaluate, and compare in this paper two approaches for automatic WSD. In the first possibilistic approach, the relevance of a word sense (resp. document in IR), given a polysemous sentence (resp. query in IR), is modeled by two measures. The possible relevance allows rejecting non-relevant senses. The necessary relevance permits reinforcing possibly relevant senses. In the second probabilistic approach, we compute the semantic distance between word senses using an existing probabilistic distance, as proposed in [57]. In this step, we consider the complete topology of a traditional dictionary seen as a graph on its entries. We show how the use of the SDC improves the results of the WSD process. To illustrate our work, we carried out experiments on the standard ROMANSEVAL1 test collection and we compared our two approaches to some existing WSD systems. This paper is a fully revised version of a conference paper. In Ben Khiroun et al. [16], we briefly presented our possibilistic approach for WSD. In this new contribution, we mainly address the following new issues: (i) we explain the theoretical contribution of possibility theory compared to probability theory; (ii) we specify the structure of the SDC, modeling WSD knowledge by means of graphs, and we explain how nodes and edges are defined; (iii) we propose and access a new probabilistic approach for WSD; (iv) we compare and discuss the results of these approaches, giving more accurate interpretations of cases of success and failure; and (v) we propose more perspectives for future investigations. The paper is organized as follows. We present in Sect. 2 related works to the field of WSD and discuss the solutions aiming to resolve the problem of WSD. In Sect. 3, we briefly recall possibility theory. Section 4 details the graph-based structure of the SDC. The possibilistic and probabilistic approaches for WSD are presented in Sects. 5 and 6, respectively. In Sect. 7, we detail our experimentations and expose a comparative study between WSD approaches. Finally, Sect. 8 concludes this paper by evaluating our work and proposing some directions for future research. 1 http://www.lpl.univ-aix.fr/projects/romanseval.

123


2 Related works The problem of semantic disambiguation presented the subject of several discussions, conferences, and research. Text Retrieval Evaluation Conference (TREC, 1990 [107]), Message Understanding Conference (MUC, 1987–1998 [58]), Automatic Content Extraction(ACE, 2000 [38]), and TREC-QA (TREC-2004 [106]) are examples of conferences that were organized in this field and improved research on. Indeed, the interest on the problem of the semantic disambiguation started in the eighties. Indeed, many lexical resources were developed such as electronic dictionaries, glossaries, thesauri, and ontologies. Much of the work in semantic disambiguation exploited theses resources. The work on definitions (i.e., dictionaries) was initiated by Lesk [72], which is proposed to link word definitions if they have common words. Then, other studies tried to improve/complete this approach [13,59,82,100,114]. More recently, the concept of corpus rose as a solution using symbolic [90] or statistical approaches that are often based on co-occurrence [89]. The idea consists in analyzing words co-occurring with polysemous words in large corpora. These systems are accurate since they aim to model the sense of each word according to its context, starting from semantically labeled corpora. The adequacy between a given sense and the word to be treated is computed using a measure of similarity between the characteristics of the modeled senses and those of the context of the considered occurrence. SensEval and SemEval campaigns (cf. Navigli [78], Erk and Strapparava [50]) presented comparative studies of the problems of disambiguation using corpora. The results proved that approaches based on learning corpora reached higher success rates than the others. Indeed, the best success rates were 80 % for nouns, 70 % for verbs, and 75 % for adjectives. In this literature review, we cite the most important methods that allowed to clarify the main issues in monolingual WSD. These approaches are classified according to the source of WSD knowledge and how it is structured, as it is one of the main parameters of these approaches [104]. Thus, we distinguish four categories of approaches: (i) approaches modeling knowledge or reasoning, (ii) dictionary-based approaches; (iii) corpus-based approaches; and (iv) hybrid approaches. 2.1 Approaches modeling knowledge or reasoning This type of approach tries to model the human comprehension of language through connexionist or symbolic models that are well known in artificial intelligence [7]. These techniques lead to several developments in the field starting from “semantic networks” (e.g., Masterman [75]). However, this approach is time-consuming because immense knowledge necessary to disambiguate words is coded manually. They use specific knowledge bases do not ensure a sufficient coverage of the language. For this reason, models based on this approach (such as the preferential semantics of Wilks [113], the experts in words of Small and Rieger [93] or microattribute sets of Bookman [20] and Waltz and Pollack [109]) are not very promising for real situations where we need to treat huge amounts of texts [111,112]. For more details, see Audibert [7]. On the other hand, and in order to solve drawbacks related to supervised WSD systems, knowledge-based WSD seems promising as an influential choice. Indeed, and without using any corpus data, knowledge-based systems took advantage of the information in a lexical knowledge base (LKB) to suitably achieve WSD task. Especially, graph-based approaches proved their effectiveness in the WSD state-of-the-art works [79,92]. These graph-based approaches using WordNet benefit from the advantage of graph-based performances to facilitate finding and using of the structural properties of the graph underlying a specific

123


LKB. Indeed, Sinha and Mihalcea [92] proposed graph-centrality measures over custombuilt graphs, where the senses of the words in the context are linked with edges weighted via diverse similarity scores. Navigli and Lapata [79] exploited a subgraph using a depthfirst strategy over the whole graph of WordNet and then apply a variety of graph-centrality measures, with degree yielding the best results. Besides, and using freely available datasets, Agirre and Soroa [3] outperformed other knowledge-based WSD systems when they assessed their graph-based algorithm using Personalized PageRank. Then, Agirre et al. [4] exploited the only domain-specific WSD assessment dataset available in [66] to investigate the application of their algorithm to domainspecific corpora. In fact, results in [4] were performed better than those methods proposed in [3]. Recently, Faralli and Navigli [52] proposed a new minimally supervised framework for domain word sense disambiguation. They firstly took advantage of a bootstrapping method to iteratively obtain glossaries for several domains from the Web. Then, they exploited these acquired glosses as the sense inventory for fully unsupervised domain WSD. Their experiments are performed using new and gold standard datasets and proved that their widecoverage framework performs high results on dozens of domains at a fine-grained and coarse setting (approximately 69 and 80 % in a fine-grained and coarse level, respectively). On the other hand, Pilehvar et al. [84] proposed a unified approach for measuring semantic similarity at multiple lexical levels, from word senses to texts. For all kinds of linguistic data, their approach takes advantage of a regular probabilistic representation at the sense level. Besides, and using semantic similarity at diverse lexical levels in three experiments (word sense coarsening, word similarity, and semantic textual similarity), the new semantic representation provided by the unified approach performs better results than the state-of-the-art similarity measures that are often particularly designed for every level (sense, word, and text). Ponzetto and Navigli [85] proposed an approach based on extending WordNet with millions of semantic relations generated from Wikipedia pages in order to relieve the knowledge acquisition bottleneck. Indeed, WordNet senses are automatically connected with Wikipedia pages, and relevant semantic associative relations from Wikipedia are transmitted to WordNet, hence generating a much richer lexical resource. Then, standard WSD datasets are assessed via two knowledge-based algorithms using the new extended WordNet. Experiments proved that taking into account a large number of semantic relations in knowledge-based systems performed better results comparing to the state-of-the-art supervised approaches on open-text WSD. Moreover, Ponzetto and Navigli [85] confirmed that knowledge-based systems performed better than more sophisticated supervised ones in a domain-specific WSD scenario as concluded by Agirre et al. [4]. Later, Navigli and Ponzetto [80] proposed a graph-based approach to multilingual joint word sense disambiguation. Firstly, the method carried simultaneously the lexical knowledge from diverse languages using empirical data for disambiguation from each of them. Secondly, a piece of sense evidence for the meaning of a target word in context was given by each language. Finally, the mixing of these different pieces allows them to constrain each other. The results proved a significant improvement comparing to existing high-quality graph-based approaches in both monolingual and multilingual tasks. On the other hand, Zhong and Ng [119] proposed a supervised English all-words WSD tool for free text. Indeed, the developed tool, called It Makes Sense (IMS), was built using a supervised learning method in order to provide to user the possibility of choosing different tools to achieve preprocessing, such as testing various machine learning approach or toolkits in the classification task, and experimenting different features in the feature extraction task.

123


By studying these works, we remark that a lot of researches have recently been done for moving the systems away from WordNet and toward encouraging the systems to pick their own resources, their own synonyms, and then to compare against a human-generated gold standard for WSD. This means exploiting other resources such as dictionaries and corpora. 2.2 Dictionary-based approaches These approaches appeared when new resources were adopted for disambiguation. The computerized dictionary is a basic element of research in lexical disambiguation. The principal idea of this approach is that when several words co-occur, the most probable sense for each one of these words is the one which maximizes its similarity to the senses of the words co-occurring with it. The first works on dictionaries tried to build knowledge bases from their definitions [7]. Lesk [72] proposed to link each sense to the list of words appearing in its definition. Wilks et al. [114] tried to develop this approach by computing the frequencies of words in these definitions. Then, several probabilistic measures of similarity have been used. A more sophisticated approach has been proposed by Véronis and Ide [100], which generated neural networks from the definitions of Collins English Dictionary (CED). Other researchers tried to use further information in disambiguation such as the semantic codes of LDOCE [7]. Unfortunately, such information is not available in all dictionaries. On the other hand, Brun et al. [28] proposed a WSD system of lexical semantic disambiguation based on the use of an electronic dictionary. This system was firstly designed for English WSD [27] and later adopted to French. Indeed, the system took advantage of the precise structure and the increased consistency of this new resource type used as semantically annotated corpora in order to extract semantic disambiguation rules. The obtained results for French provided close information on some potential enhancements of the nature and content of lexical resources designed for WSD task. This system seems particularly promising in many applications such as (i) it can be integrated in a help understanding system dedicated for written foreign languages; (ii) it can be integrated into semantic indexing process and more generally in any application adopted for the extraction and the understanding of knowledge contained in electronic documents. 2.3 Corpus-based approaches These approaches use corpora to train models for WSD. For example, Brown et al. [26] used the Semlink corpus [73]. In fact, corpora have been firstly used in the eighties by Weiss [110], which proposed to learn disambiguation rules from labeled corpora. Then, methods of supervised machine learning were widely used [7]. In this period, corpus-based approaches aimed to avoid the limits of traditional dictionary by building “virtual” dictionaries modeling distributional or contextual knowledge. Indeed, Véronis [102] argues that it is not possible to progress in WSD while dictionaries do not include in their definitions distributional criteria or surface indices (syntax, collocations, etc). This is why, within its group, Reymond [87,88] proposed a “distributional” dictionary for automatic WSD. The idea is to organize words in lexical items having coherent distributional properties. This dictionary contained initially the detailed description of 20 common nouns, 20 verbs, and 20 adjectives. It enabled it to manually label each of the 53,000 occurrences of these 60 terms in the corpus of the project SyntSem (corpus of approximately 5.5 million words, composed of texts of varied kinds). This corpus is a starting resource to study the criteria of automatic semantic disambiguation since it helps implement and evaluate algorithms of WSD. Audibert [6–8] worked on this

123


dictionary to study different criteria of disambiguation (co-occurrence, domain information, synonyms of co-occurring words, and so on). Another way of research followed by Véronis [103], always in the idea to mitigate the insufficiencies of the traditional dictionaries as regards discrimination of the sense, is the use of a graph of co-occurrence. The algorithm searches high-density zones in the graph of co-occurrence and allows, contrary to the traditional methods of textual analysis (like the vectors of words), to isolate non-frequent usages. Jean Véronis applied the advice of Wittgenstein: “Don’t look for the meaning, but for the use.” Then, co-occurrence graphs have been widely used for WSD (e.g., Tae-Gil et al. [97]). Their usage is also justified when machine-readable dictionaries are not available or not enough structured. For example, TliliGuiassa and Merouani [98] proposed to contribute in Arabic WSD by building a “virtual” dictionary from three types of links: (i) co-occurrence relations, (ii) derivational links, and (iii) syntactic dependencies. A further solution consists in working on paradigmatic relations between words (synonymy, antonymy, etc.). As noticed by Edmonds and Hirst [46], a word can express a myriad of implications, connotations in addition to its sense in the dictionaries. A word has synonyms (we mean here the relation of partial synonymy), which differ from it in some nuances of sense. They seek to develop a computational model of lexical knowledge, which takes into account the relation of “almost synonymy” and which in a task of automatic translation can choose the best word, which will give an exact nuance of sense, in a given context. The goal is be able to represent the indirect, fuzzy, or context-dependant senses. Then, fuzzy synonymy was largely applied for WSD (e.g., Soto et al. [94]). In the last years, WSD community has confirmed that WSD task should be incorporated in real NLP applications such as machine translation or multilingual IR. Indeed, some researchers have concluded that it will be hard to achieve a concrete improvement in this field if we continue to consider WSD exercise as an isolated research task. In fact, it will be possible to resolve the granularity problem that might be WSD task-dependent as well if it is fully incorporated in multilingual applications. Moreover, this kind of corpus-based approach is language-independent and can be a suitable choice for languages that require adequate sense inventories and sense-tagged corpora. The first “Cross-Lingual WSD” task was organized at SemEval-2010 [70] for which 16 submissions from 5 different research teams were contributed. But, the second edition was proposed at SemEval-2013 [71] for which new test data are annotated with the intention of increasing additional insights into the feasibility and the difficulty of cross-lingual WSD. They reported results for the 12 official submissions from 5 different research teams, in addition to the ParaSense system that was developed by the task organizers. Indeed, “CrossLingual WSD” is an unsupervised WSD task for English nouns using parallel corpora. Five translations in the different target languages (French, Italian, Spanish, Dutch, and German) were used to generate the sense label. The parallel corpus Europarl was used to build the sense inventory. For a given polysemous word, different senses are grouped into cluster of all possible translations. Then, and in order to assign weights to the set of gold standard translations, native speakers choose, for the test data, the suitable translation cluster(s) for every test sentence and provide their top three translations from the predefined list of Europarl translations. There were two types of assessments in which systems can participate (i) a multilingual assessment in which translations in all five target languages are assessed; and (ii) a bilingual assessment in which translations in one target language are assessed. In fact, performed assessments are the “best result” assessment in which only the first translation given by a system was considered, and the “top five” results in which the first five translations given by a system were considered. All these evaluations were done using recall and precision metrics.

123


2.4 Hybrid approaches A hybrid approach uses at the same time knowledge extracted from lexical resources (i.e., traditional dictionaries) and contextual/distributional information learned from corpora. These approaches reached encouraging success rates in terms of precision of disambiguation. We cite for example the works of Dahlgren [35] and Yarowsky [115]. The latter used the ROGET thesaurus together with corpora. The McRoy’s system [76] used 13000 senses organized in a conceptual manner. The final decision is based on a complex system of weighting using several resources. As other example of approach combining a thesaurus and a corpus, Yuret and Yatbaz [117] proposed to combine WordNet with large amounts of untagged texts. We may also refer to Jimeno-Yepes et al. [63], which tried to learn classifiers from the MEDLINE corpus based on the MeSH thesaurus. On the other hand, Resnik [86] presented a new measure for semantic similarity in IS_A taxonomies, based on the notion of information content. Stevenson and Wilks [96] presented a system combining several information sources: a morphosyntaxic filtering based on labelling, collocations generated from the corpus, an overlapping with the definitions of the dictionary LDOCE, categories of subjects, and restrictions of selection. Stevenson and Wilks [96] have impressive results: a disambiguation precision of 95 % at the level of the homographs of LDOCE and of 90 % at level of senses. Mihalcea and Moldovan [77] presented an approach based on the semantic idea of density. They use the taxonomy and the definitions of WordNet in conjunction with statistics resulting directly from corpora recovered on Internet. Finally, ontologies are more and more used in recent WSD-based IR systems. For example, Barathi and Valli [14] combined WordNet and domain knowledge modeled as ontology to enhance IR results. 2.5 Discussion Building labeled corpora to train WSD algorithms is hard and time-consuming [1,2,82]. If we assume that such corpora exist, extracting knowledge useful for lexical disambiguation is a difficult task because the corpus is noisy; the relevant information required for disambiguation is distributed in the corpus, and polysemous words have few occurrences compared to its size [7]. However, dictionaries represent rich networks of associations between words and a set of potentially exploitable semantic categories for NLP. They contain less noise, and it is easy to extract the senses of almost all polysemous words. The problem is that these dictionaries were made for human use and are not suitable for automatic treatments. They thus miss accurate information useful for WSD. In addition, the inconsistency of the dictionaries is well-known lexicographers [65]. A dictionary is also related to a period of the history, thus containing some senses that are not necessarily used in other periods. Hybrid approaches try to avoid the limits of both approaches by combining them. However, we will be limited to dictionary-based approaches because other types of resources (such as thesauri and ontologies) are not available for all languages and do not necessarily cover a given language. In a hybrid approach using a dictionary, the main role of the latter is (i) to provide all the senses of words; and (ii) to provide a less noisy definition of each sense. Then, it is easy to extract and mine these definitions in order to extract context-free knowledge. The role of the corpus is (i) to filter the senses of words, keeping only really used senses; and (ii) to provide contextual or distributional knowledge useful for WSD. In fact, hybrid approaches were not massively studied yet. However, it is probably this type of approach that may achieve the best results, that is, we propose in this paper a dictionary-

123


based hybrid approach for WSD. We exploit a labeled corpus to extract contextual knowledge, which models co-occurrence links and dependencies between words (contexts) and really used senses. The senses are extracted from a high-coverage traditional dictionary. To the best of our knowledge, none of the current methods treated in a sufficient manner the problem of the organization of the lexicon. To solve this problem, we define and use a semantic dictionary of contexts (SDC), which stores knowledge extracted from the corpus and the dictionary (cf. Sect. 4). Another side of WSD approaches consists in computing similarities between senses and contexts in order to identify the “best” sense. Existent approaches use probabilistic distance while possibility theory provides an innovative framework for such application, but has not been yet applied for WSD. Indeed, existent methods based on the calculations of similarities do not seek to represent the semantic distances between senses and do not manage correctly the organization of the obtained senses. Several research works tried to resolve the problem of polysemy on the level of dictionary. Gaume [56] used a dictionary as information source to discover relations between lexical items. His work is based on an algorithm that computes the semantic distance between the words of the dictionary by taking into account the complete topology of the dictionary, which gives it a greater robustness. This algorithm makes it possible to disambiguate polysemous words in definitions of the dictionary. It started to test this approach on the disambiguation of the definitions of the dictionaries themselves. He proposed the PROX method detailed below within Sect. 6. The model that we propose is supported by a semantic space where the various senses of a word are organized. According to the classification of Vidhu Bhala and Abirami [104], we use a structural semantic representation involving semantic relations between words and senses. Computing the sense of a sentence is a dynamic process during which the senses of the various words are mutually influenced and which leads simultaneously to the determination of the sense of each word and to a global sense for the sentence. Our probabilistic approach exploits the probabilistic distance of Gaume [56] in order to generalize it toward a dynamic method of sense calculating. As second alternative, we use possibilistic networks to compute the distance between the context and a given sense. We also propose to compute a preliminary rate of ambiguity of each sentence, that is, we start by recalling principles of possibility theory in the following section.

3 Possibility theory The possibility theory introduced by Zadeh [118] and developed by several authors (e.g., Dubois and Prade [44]) handles uncertainty in the interval [0, 1] called the possibility scale, in a qualitative or quantitative way [44]. Our basic approach is based on the quantitative setting. This section briefly reviews basic elements of possibility theory; for more details, see [39,42–44]. 3.1 Possibility distribution Possibility theory is based on possibility distributions. Given a universe of discourse = {ω1 , ω2 , . . . , ωn }, a fundamental concept denoted by π corresponds to a function, which associates to each element ωi from the universe of discourse a value from a bounded and linearly ordered valuation set (L, ∈ A – Paradigmatic relations, especially the synonymy; we build a graph in which two nodes are connected by an edge if the corresponding words maintain a synonymic relation [83]. In fact, if they share common words in their dictionary definitions: ∀ m i , m j , i f {De f inition(m i ) ∩ De f inition(m j )} = ∅→ < m j , m i >∈ A – Relations of semantic proximity; they are less specific relations, which can take into account at the same time the paradigmatic axis and the syntagmatic axis. As in [100], we built a graph of the lexicon of the traditional dictionary. This allows to build contextindependent links between words and senses [117]. Indeed, we build an edge between two words m i and m j if m j appears in the definition of m i . This may be formalized as follows: ∀ m i ∈ Polysemy( ph), ∀ m j ∈ De f inition(m i )→ < m j , m i >∈ A These edges are weighted according to the formulae which will be presented in the following sections and illustrated through examples.

5 A possibilistic approach for automatic WSD Word sense disambiguation (WSD) is also seen as a classification task where we have training and testing steps. In the training step, we need to learn dependencies between senses of words and contexts. This may be performed in labeled corpora (Judgment-based training) leading to a semi-automatic approach. We may also weight these dependencies directly from a traditional dictionary (dictionary-based training), what may be considered as an automatic approach [16]. In this case, we need to organize all the instances in such a way that improves classification rates. In this paper, we propose to sort the instances by computing an ambiguity rate (cf. Sect. 5.2). In the testing step, the distance between the context of an occurrence of a word and its senses is computed in order to select the best sense. Therefore, we first present formulae and examples for computing the degree of possibilistic relevance (DPR) and the ambiguity rate.

123


S1

t1

Si

t2

t3

SN

t4

tT

Fig. 1 Possibilistic network of WSD approach

5.1 The degree of possibilistic relevance (DPR) Supposing that we have only one polysemous word in a sentence ph, let us note D P R(Si | ph) the degree of possibilistic relevance of a word sense Si given ph. Let us consider that ph is composed of T terms: ph = (t1 , t2 , . . ., tT ). We evaluate the relevance of a word sense Si given a sentence ph by a possibilistic matching model of IR used in [22,47–49]. In this case, the goal is to compute a matching score between a query and a document. In the case of WSD, the relevance of a sense given a polysemous sentence is modeled by a double measurement. The possible relevance makes it possible to reject the irrelevant senses. But the necessary relevance makes it possible to reinforce relevance of the restored word senses, which have not been rejected by the possibility. In our case, possibilistic network links the word sense (Si ) to the words of a polysemous sentence ( ph i = (t1 , t2 , . . ., tT )) as presented in Fig. 1. The relevance of each word sense (S j ), giving the polysemous sentence ( ph i ), is calculated as follows: According to the matching model proposed in [48,49], the possibility (Sj |ph) is proportional to (Sj |ph) = (t1 |Sj )∗ . . .∗ (tT |Sj ) = nft∗1j . . .∗ nftTj

(9)

With nftij = tfij /max(tfkj ): the normalized frequency of the term ti in the sense S j . And tfij = (number of occurrence of the term ti in S j /number of terms in S j ). The necessity to restore a relevant sense S j for the sentence ph, denoted N (S j | ph), calculated as the following: N(Sj |ph) = 1 − (¬Sj |ph)

(10)

(¬Sj |ph) = ((ph|¬Sj )∗ (¬Sj ))/(ph)

(11)

where

At the same way, (¬S j | ph) is proportional to (¬Sj |ph) = (t1 |¬Sj )∗ . . .∗ (tT |¬Sj )

(12)

This numerator can be expressed by (¬Sj |ph) = (1 − φ S1j )∗ . . .∗ (1 − φ STj )

123

(13)


where φ Sij = Log10 (nCS/nSi )∗ (nftij )

(14)

With nCS = Number of senses of the word in the traditional dictionary. nSi = Number of senses of the word containing the term t j . This includes only senses that are in the SDC and does not cover all the senses of ti that are in the traditional dictionary. We define the degree of possibilistic relevance (DPR) of each word sense S j , giving a polysemous sentence ph by the following formula: DPR(Sj |ph) = (Sj |ph) + N(Sj |ph)

(15)

The preferred senses are those that have a high score of D P R(S j | ph). 5.2 The ambiguity rate of a polysemous sentence A sentence is considered having a big ambiguity rate if senses corresponding to ambiguous words in the sentence have similar meaning and/or do not fit the sentence context. We compute the ambiguity rate of a polysemous sentence ph using the possibility and necessity values as follows: (i) we index all possible senses of the ambiguous word; (ii) we use the index of each sense as a query; (iii) we evaluate the relevance of the sentence given this query; and (iv) a sentence is considered as very ambiguous if it is relevant for many senses or if it is not relevant for any one. Therefore, ambiguity rate is inversely proportional to standard deviation value: Ambiguity_rate( ph) = 1 − σ ( ph)

(16)

where σ ( ph): standard deviation of DPR(S j | ph) values corresponding to each sense of ambiguous word contained in the polysemous sentence ph. 2 1 σ ( ph) = ∗ (17) DPR S j | ph − S j N where S is the average of DPR(S j |ph) and N is the number of possible senses in the dictionary. 5.3 Illustrative example Let us consider the French sentence: “Le noyau de l’implantation de l’avocat est le fruit des efforts juridiques.” In order to simplify the calculus in this example, we consider the word “avocat” as the unique polysemous word having the following two senses: S1 : avocat_1 and S2 : avocat_2. avocat_1 Praticien et professionnel du droit dont la fonction traditionnelle est de conseiller ses clients sur des questions juridiques, qu’elles soient relatives à leur vie juridique quotidienne ou plus spécialisées,… avocat_2 Fruit comestible de l’avocatier, à pulpe jaune, contenant un gros noyau, fortement conseiller dans plusieurs cocktails de fruits et des confitures… “avocat_1” is indexed by the three terms {conseiller, juridique, droit} and “avocat_2” is indexed by {conseiller, fruit, noyau}. We begin by removing stop words from the initial French sentence. We consider here our polysemous sentence: ph = {avocat, juridique, fruit, noyau}.

123


We have (avocat_1|ph) = nf(avocat, avocat_1) * nf(juridique, avocat_1) * nf(fruit, avocat_1) * nf(noyau, avocat_1) = 0 * (1/3) * 0 * 0 = 0 With nf(avocat, avocat_1) is the normalized frequency of avocat in the first sense “avocat_1.” (avocat_2|ph) = nf(avocat, avocat_2) ∗ nf(juridique, avocat_2) ∗ nf(fruit, avocat_2) ∗ nf(noyau, avocat_2) = 0∗ 0∗ (1/3)∗ (1/3) = 0

We have frequently (avocat_1|ph) = 0 and (avocat_2|ph) = 0, except if all the words of the sentence exist in the index of the sense. On the other hand, we have a not null values of N(avocat_1|ph) and N(avocat_2|ph): N(avocat_1|ph) = 1 − [(1 − φ(avocat_1, avocat)) ∗ (1 − φ(avocat_1, juridique)) ∗ (1 − φ(avocat_1, fruit)) ∗ (1 − φ(avocat_1, noyau))] We have nf(avocat_1, avocat) = 0, so : φ(avocat_1, avocat) = 0; φ(avocat_1, juridique) = log10 (2/1) ∗ 1/3 = 0.1; φ(avocat_1, fruit) = log10 (2/1) ∗ 0 = 0; φ(avocat_1, noyau) = 0 So: N(avocat_1|ph) = 1−[(1−0)∗(1−0.1)∗(1−0)∗(1−0)] = 1−[1∗0.9∗1∗1] = 0.1; And DPR(avocat_1|ph) = 0.1 N(avocat_2|ph) = 1 − [(1 − φ(avocat_2, avocat)) ∗ (1 − φ(avocat_2, juridique)) ∗ (1 − φ(avocat_2, fruit)) ∗ (1 − φ(avocat_2, noyau))] We have nf(avocat_2, avocat) = 0, so : φ(avocat_2, avocat) = 0; φ(avocat_2, juridique) = 0; φ(avocat_2, fruit) = log10 (2/1) ∗ 1/3 = 0.1; φ(avocat_2, noyau) = 0.1 So : N(avocat_2|ph) = 1−[(1−0)∗(1−0)∗(1−0.1)∗(1−0.1)] = 1−[1∗0.9∗0.9∗1] = 0.19. And DPR(avocat_2|ph) = 0.19 > DPR(avocat_1|ph) We remark that the polysemous sentence ph is more relevant for “avocat_2” than “avocat_1” because it contains two terms of the second sense “avocat_2” (fruit, noyau) and only one term of the sense “avocat_1” (juridique). The average S = (0.1+0.19)/2 = 0.145. The Standard Deviation σ ( ph) = (1/2∗((0.1− 0.145)2 + (0.19 − 0.145)2 ))1/2 = 0.045 and the Ambiguity_rate( ph) = (1 − σ ( ph)) = 0.955. Let us notice in this example that the polysemous sentence ph is very ambiguous because two values 0.1 and 0.19 are very close.

6 A probabilistic approach for automatic WSD This section presents a new probabilistic approach based on a generalized version of the PROX method initially proposed by Gaume et al. [57]. For further explanation of this approach, we first present the semantic calculus in Sect. 6.1 and an illustrative example in Sect. 6.2. 6.1 Semantic calculus Modeling the problem in semantic graph opens the door to mathematical and data processing representations making it possible to measure the sense of a word compared to its definitions mentioned in the traditional dictionary. With this intention, we transformed the graph into a Markov matrix whose states are the nodes of the graph and its edges are the possible transitions. We first generate from the graph of the SDC a matrix of transition (cf. Sect. 6.1.1), which is then transformed into a Markov matrix (cf. Sect. 6.1.2). Then, we apply the proximity-based algorithm of disambiguation detailed in Sect. 6.1.3.

123


6.1.1 Building the adjacency matrix We generate adjacency matrix from the graph G = < S, A >. We note [G] the square matrix n × n such that for all r , s ∈ S, [G]r,s = | < s, r > | if (r, s) ∈ A and [G]r,s = 0 if (r, s) ∈ / A. We called [G] the transition matrix of G. Since G is not directed, so [G] is a symmetric matrix. Besides, G is a reflexive graph, so ∀ r ∈ S, [G]r,r = 1. 6.1.2 Building the Markov matrix ˆ the Markov We generate the Markov matrix from the adjacency matrix. Let us note [G] matrix corresponding to the graph G = < S, A > defined as the following:

[G]r,s = (18) ∀ r, s ∈ S, Gˆ r,s [G]r,x x∈S

6.1.3 WSD using the proximity-based algorithm The basis of the proximity-based WSD approach is presented firstly. Secondly, we present a new version of this method, which is applied to WSD. The proximity-based algorithm This method has been proposed by Gaume et al. [57]. It is a stochastic method used to study the structure of a dictionary graph using Markov chains. We recall that Markov chains reached satisfactory results in WSD (see, for example, Loupy [74]). This method consists in transforming a graph into a Markov chain whose states are nodes of the graph and its edges are the possible transitions: a particle, leaving a node s0 at the moment t = 0, moves in one step on s1 one of the neighbors of s0 randomly selected; the particle moves then again in a step on s2 , one of the neighbors of s1 randomly selected, etc. If at t th step the particle is on the node st , it moves then in one step on the top st+1 , which is randomly selected among neighbors that have the same probability. According to Gaume [55], a selected trajectory s1 , s2 , . . ., st , . . . is a random “strolls” on the graph, and the dynamic ones of these trajectories give the structural properties of these graphs. Gaume et al. [57] defined PROX(G, i, s, r ) as the probability while the particle passing from the node r at the moment t = 0 to the node s at the moment t = i: P R O X (G, i, s, r ) = [Gˆ i ]r,s where Ai is the matrix A multiplied i times by itself. Dynamic word sense calculation We propose a dynamic method of calculating the sense of a word in its context by exploiting the semantic graph and by calculating the semantic distance. Our approach is based, primarily, on the principle of PROX [57]. This method represents a measurement of similarity between nodes of the graph by calculating the semantic distance between words and their definitions, which makes it possible to consider an original and innovative exploitation of semantic graphs. We define the task as follows: Let us consider a lemma m i , which is a polysemous word. We note – m i is a node of the graph G. a 1 2 – Definition (m i ) =

pi , pi , . . . , pi . ∞ t ∞ – G = lim Gˆ , so G is a vector of Ra . t→∞

– f ∞ (r, s) = lim P R O X (G, t, s, r ). The function f ∞ indicates the semantic proximity t→∞ between polysemous words and their definitions in the SDC.

123


So we have the following property: Property: Since the graph G is reflexive and strongly related, then ∀ a ∈ S, lim P R O X (G, t, s, r ) = lim P R O X (G, i, a, r ) t→∞

t→∞

That means the probability for a rather long time t to reach a node s does not depend on the node of departure (s or a). We say that α is strongly related to α i if and only if ∀ j ∈ N\{i}, f ∞ (α, αi ) > f ∞ (α, αj ). In this case, the result of word sense disambiguation of α is αi . The word β carrying information checks the following properties: • β ∈ context(α, ph). • ∀ δ ∈ context(α, ph)\{β}, f ∞ (α, β) = maxδ ( f ∞ (α, δ)), where β is the definition of α. In this case, β is the semantic definition of α in the semantic dictionary of contexts (SDC). 6.2 Illustrative example Let us consider the example of the first French polysemous sentence: PH1: « François est un vrai avocat qui s’occupe des droits des étrangers sur Paris. ». which is translated as « François is a real advocate who deals with the rights of foreigners on Paris. » Polysemous sentence PH1 is represented as follows: – – – –

Context(PH1) = {vrai, avocat, occuper, droit, étranger}. Significant(PH1) = {occuper, vrai, étranger}. Polysemy(PH1) = {avocat, droit}. Definition(avocat) = avocat_1 Praticien et professionnel du droit dont la fonction traditionnelle est de conseiller ses clients sur des questions juridiques, qu’elles soient relatives à leur vie juridique quotidienne ou plus spécialisées,… avocat_2 Fruit comestible de l’avocatier, à pulpe jaune, contenant un gros noyau, fortement conseiller dans plusieurs cocktails de fruits et des confitures…

– Definition(Droit) = droit_1 Faculté reconnue de jouir d’une chose, d’accomplir une action. droit_2 Taxe, impôt droit_3 Lois et dispositions juridiques qui règlent les rapports entre les membres d’une société – Definition(Occuper) = {Prendre possession d’un endroit.}. – Definition(vrai) = {Qui présente un caractère de vérité}. – Definition(Etranger) = {Qui est d’une autre nation; qui est autre, en parlant d’une nation}. The semantic graph, which corresponds to example of PH1, is presented in Fig. 2: The semantic calculation of word sense is based on the following Markov matrix of the example of PH1:

123


1

Etranger (5)

1

Vrai (4)

1 1 1

1

1 Droit_1 (8)

1

1

1

Droit_3 (10)

1

Avocat_2 (3)

1

Avocat_1 (2)

1

1 1 1

Droit (7)

1

1

1

1

Avocat (1)

1

1

1

Droit_2 (9)

1 1

1

Occuper (6)

1

Fig. 2 Semantic graph of example PH1

⎛

(0) ⎜ (1) ⎜ ⎜ (2) ⎜ ⎜ (3) ⎜ ⎜ (4) ⎜ ˆ = ⎜ (5) [G] ⎜ ⎜ (6) ⎜ ⎜ (7) ⎜ ⎜ (8) ⎜ ⎝ (9) (10) ⎛ (0) ⎜ (1) ⎜ ⎜ (2) ⎜ ⎜ (3) ⎜ ⎜ (4) ⎜ ˆ ∞ = ⎜ (5) [G] ⎜ ⎜ (6) ⎜ ⎜ (7) ⎜ ⎜ (8) ⎜ ⎝ (9) (10)

(1) 1/7 1/4 1/2 1/5 1/5 1/5 1/9 0.0 0.0 0.0

(2) 1/7 1/4 0.0 0.0 0.0 0.0 1/9 0.0 0.0 1/3

(1) 0.160 0.157 0.164 0.160 0.160 0.160 0.157 0.154 0.154 0.156

(3) 1/7 0.0 1/2 0.0 0.0 0.0 0.0 0.0 0.0 0.0

(2) 0.090 0.092 0.089 0.089 0.089 0.089 0.091 0.091 0.091 0.093

(4) 1/7 0.0 0.0 1/5 1/5 1/5 1/9 0.0 0.0 0.0 (3) 0.046 0.044 0.051 0.046 0.046 0.046 0.044 0.042 0.042 0.043

(5) 1/7 0.0 0.0 1/5 1/5 1/5 1/9 0.0 0.0 0.0

(6) 1/7 0.0 0.0 1/5 1/5 1/5 1/9 0.0 0.0 0.0

(4) 0.114 0.112 0.115 0.114 0.114 0.114 0.112 0.111 0.111 0.111

(7) 1/7 1/4 0.0 1/5 1/5 1/5 1/9 1/2 1/2 1/3

(5) 0.114 0.112 0.115 0.114 0.114 0.114 0.112 0.111 0.111 0.111

(8) 0.0 0.0 0.0 0.0 0.0 0.0 1/9 1/2 0.0 0.0 (6) 0.114 0.112 0.115 0.114 0.114 0.114 0.112 0.111 0.111 0.111

(9) 0.0 0.0 0.0 0.0 0.0 0.0 1/9 0.0 1/2 0.0

⎞ (10) 0.0 ⎟ ⎟ 1/4 ⎟ ⎟ 0.0 ⎟ ⎟ 0.0 ⎟ ⎟ 0.0 ⎟ ⎟ 0.0 ⎟ ⎟ 1/9 ⎟ ⎟ 0.0 ⎟ ⎟ 0.0 ⎠ 1/3

(7) 0.202 0.205 0.199 0.203 0.203 0.203 0.205 0.208 0.208 0.206

(8) 0.044 0.045 0.042 0.044 0.044 0.044 0.046 0.048 0.048 0.046

(9) 0.044 0.045 0.042 0.044 0.044 0.044 0.046 0.048 0.048 0.046

⎞ (10) 0.067 ⎟ ⎟ 0.069 ⎟ ⎟ 0.065 ⎟ ⎟ 0.066 ⎟ ⎟ 0.066 ⎟ ⎟ 0.066 ⎟ ⎟ 0.068 ⎟ ⎟ 0.069 ⎟ ⎟ 0.069 ⎠ 0.071

The observation of the dynamics of the particle starting from a node r toward a node s indicates the semantic relationship between the two nodes r and s. For example, Fig. 3 represents the curve of f t (avocat, avocat_1) = [Gˆ t ]avocat, avocat_1 and f t (avocat, avocat_2) = [Gˆ t ]avocat,avocat_2 . These scores represent the probability of the particle starting from a node “avocat” toward a node “avocat_1” and “avocat_2,” respectively, at the moment t. According to these two curves, when time t tends toward infinite, each one of these curves tends toward their limit: f ∞ (avocat, avocat_1) = 0.090; f ∞ (avocat, avocat_2) = 0.046. ˆ ∞ that “droit” then “avocat” are the most important According to Table 1, we notice in [G] words than all the other nodes in semantic graph.

123


Fig. 3 Curves of convergence of «avocat_1 »and «avocat_2 » ˆ ∞ of the example of PH1 Table 1 Result of the semantic proximity of [G] Avocat (1)

avocat _1 (2)

avocat_2 (3)

Vrai (4)

Etranger (5)

Occuper (6)

Droit (7)

droit_1 (8)

droit_2 (9)

droit_3 (10)

0.160

0.090

0.046

0.114

0.114

0.114

0.202

0.044

0.044

0.067

We distinguish that – “Avocat” is more related to “avocat_1” than “avocat_2” because f ∞ (1,2) = 0.09 > f ∞ (1,3) = 0.046. – “Droit” is more related to “droit_3” than “droit_1” and “droit_2” because f ∞ (7,10) = 0.068 > f ∞ (7,8) = f ∞ (7,9) = 0.046. The learning space is – “Avocat” = “avocat_1” and “droit” = “droit_3” (it is the significant word: value 0.090 of ˆ ∞ ). “avocat_1” in [G] – “Droit” = “droit_3” and “avocat” = “avocat_1” (it is the significant word: value 0.067 of ˆ ∞ ). “droit_3” in [G] These two definitions of “avocat_1” and “droit_3” are added to the semantic dictionary of contexts. Moreover, this result is taken into account in the next steps of the analysis. Let us consider the example of the second French polysemous sentence: PH2 : « L’avocat pratique la loi », which is translated as: « The advocate practice the law ». – Context(avocat, PH2) = {pratiquer, loi}. – Polysemy(PH2) = {loi}. – Definition(pratiquer) = {Applications des principes d’un art, d’une science ou d’une technique.}. – Definition(loi) = loi_1 l’ensemble des règles, droits et devoirs, édictée par une autorité, que toute personne doit suivre [Droit] Loi_2 convention Loi_3 énoncé de phénomènes dans un domaine particulier – “Avocat” exists in the semantic dictionary of contexts. – Enrichment of the context of “Avocat”: Context(Avocat) ={pratiquer, loi, droit (droit_3)}. – Semantic Space : {avocat, avocat_1, avocat_2, droit_3, pratiquer, loi, loi_1, loi_2, loi_3}. The semantic graph corresponding to the example of PH2 is presented by Fig. 4. We remark that |loi_1, droit_3| = 2 because loi_1∩ Droit_3 = {loi, règle}.

123


1

Droit_3 (4) Loi_1 (7)

2

1

1

1

Avocat (1)

1 Loi_2 (8)

1 Loi (6)

1

1

1

Loi_3 (9)

1

Pratiquer (5)

Avocat_1 (2)

1

1

1 1

1

1

Avocat_2 (3)

1

1

Fig. 4 Semantic graph of the second sentence PH2 ˆ ∞ of the second sentence PH2 Table 2 Result of the semantic proximity of [G] Avocat (1)

Avocat _1 (2)

Avocat_2 (3)

Droit_3 (4)

Pratiquer (5)

Loi (6)

loi_1 (7)

loi_2 (7)

loi_3 (9)

0.035

0.021

0.014

0.028

0.014

0.036

0.028

0.014

0.014

ˆ ∞ is given by Table 2: So, the first row in [G] According to Table 2, we remark that “Loi” then “Avocat” collect more votes than all the other nodes in the semantic graph. We distinguish that: – “Avocat” is more related to “avocat_1” than to “avocat_2”. – “Loi” is more related to “Loi_1” than to “Loi_2” and “Loi_3”. We remark also that “Loi”, “Avocat”, “droit_3”, “Loi_1,” and “Avocat_1” are strongly related. The update of the semantic dictionary of context is – Add to the entry “Avocat” the concept “loi_1”. So, “avocat” = “avocat_1”; “droit” = “droit_3” and “loi” = “loi_1”. – Add a new entry “Loi” = “loi_1” and “Avocat” = “avocat_1”. These two reported examples were carried out to evaluate the importance of the SDC in WSD. We measure the impact and the contribution of the learning step on the performances of the dynamic word sense calculation algorithm. We evaluate the relevance of our calculation by comparing the results obtained by using the SDC with those obtained without this dictionary.

7 Experimental results This section introduces the test collection (cf. Sect. 7.1) and the experimental scenarios used in our experiments (cf. Sect. 7.2). To improve our assessment, we performed two types of evaluation in the training step: the judgment-based training and the dictionary-based training, which have been detailed and discussed in [16]. Finally, we analyze, interpret, and compare the overall performance of the various executed tests of the possibilistic and the probabilistic WSD approaches in Sect. 7.3. 7.1 ROMANSEVAL test collection We used in our experiments the ROMANSEVAL standard test collection, which provides necessary tools for WSD including (1) a set of documents (issued from the Official Journal

123


of the European Commission); and (2) a list of test sentences including ambiguous words. The set of documents consists of parallel texts in 9 languages part of the Official Journal of the European Commission (Series C, 1993). Texts (numbering several thousand) consist of written questions on a wide range of topics and corresponding responses from the European Commission. The total size of the corpus is approximately 10.2 million words (about 1.1 million words per language), which were collected and prepared within MULTEXT-MLCC projects [91]. These texts were prepared in order to obtain a standard test collection. The corpus was split into words labeled with, in particular, categorical labels to distinguish the names (N), adjectives (A), and verbs (V). Then, the 600 most frequent words (200 N, 200 A, and 200 V) were extracted, as their contexts of occurrence. These words were annotated in parallel by six students in linguistics, in accordance with the senses of the French dictionary “Le Petit Larousse.” Each word occurrence may have one or several labels of sense or none. After this first step, the 60 most polysemous words have been preserved (20 N, 20 A, and 20 V) and their occurrences have been labeled in 3,624 contexts. 7.2 Experimental scenarios In order to perform the training task, we built the semantic dictionaries of contexts in the XML format using the cross-validation method [67]. To fill the XML SDC, we generally apply the cross-validation technique for all the tests except dictionary-based training detailed in [16]. In this method, 90 % random sentences of the tagged ROMANSEVAL collection are used for training the semantic dictionary of contexts (SDC) and the 10 % remaining ones are used for tests. This run is repeated 10 times for all the 60 words in each of the ten runs. The accuracy rate (mean Kappa) is computed over the 9+1 combinations. We organized the SDC in three files by categories (adjectives, nouns, and verbs). We notice there that the same SDC files are used in both possibilistic and probabilistic WSD approaches. Sentences are lemmatized using the Tree Tagger2 tool for French language. We performed three stages of tests as explained below. Two subset selection methods for building the SDC are described in [16]. To assess our system, we compute the accuracy rate for each word by using the “kappa” metric [33,51]. Indeed, the Kappa measure is based on the difference between how much agreement is actually present (“observed” agreement) compared to how much agreement would be expected to be present by chance alone (“expected” agreement) as follows [105]: Kappa =

Pobserved − Pexpected 1 − Pexpected

(19)

Kappa measure takes into account the agreement occurring by chance and is considered as a refined value. According to Landis and Koch [69], Kappa values between 0–0.2 are considered slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1 as almost perfect agreement. Using this metric, the task of evaluation should be carried out carefully because human evaluation of sense is difficult to assess. We can be astonished with [102] by the fact that the cognitive relevance had never been required. An experiment, which he undertook proved that human have poor accuracy when associating a sense of a dictionary with an occurrence of a word in a statement. Edmonds and Hirst [46] were among few authors who remarked that an occurrence of a word can have several possible senses without being able to discriminate 2 http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/.

123

Author's personal copy

Mean Kappa

Monolingual word sense disambiguation 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

XEROX POSS PROBA

Mean Kappa

Fig. 5 Mean Kappa comparison for nouns 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

XEROX POSS PROBA

Fig. 6 Mean Kappa comparison for verbs

them. This phenomenon called indetermination is however widely related to the expressivity of a language. In order to reinforce our WSD assessment and taking advantage of other evaluation metrics, we used recall, precision, and F-measure as used in [91] and mainly recommended in IR evaluation. In our cases, these metrics are computed as follows: Correct senses retrieved T otal senses in r e f er ence Correct senses retrieved Precision = Total senses proposed 2 ∗ Recall ∗ Precision F-Measure = (Recall + Precision) Recall =

(20) (21) (22)

7.3 Results and comparative study The Kappa metric is the most commonly used statistic for this purpose. A Kappa of 1 indicates perfect agreement, whereas a Kappa of 0 indicates agreement equivalent to chance [105]. An advanced comparison study is detailed in Figs. 5, 6, and 7 based on the Kappa metric. Results in these figures compare our two WSD approaches to the results of Xerox

123

Author's personal copy B. Elayeb et al. 1 0.9 0.8

Mean Kappa

0.7 0.6 0.5 0.4 0.3 0.2

XEROX POSS PROBA

0.1 0

Fig. 7 Mean Kappa comparison for adjectives

system, a state-of-the-art system using the same ROMANSEVAL test collection for French monolingual WSD [91]. Kappa measure is under 0.2 (cf. Fig. 5). This is considered as a “slight agreement” according to Krippendorff’s scale [68]. Other words over part of speech (e.g., “concentration” (concentration), “couvrir” (cover), and “vaste” (wide)) have a null Kappa value. We make a general results overview in Figs. 8 and 9: we compare the performance of the possibilistic and probabilistic WSD methods with five other monolingual WSD systems participating in the French exercise [91]. In fact, to have an objective comparative study of our approaches, we are limited to these five monolingual WSD systems dated of year 2000 [91], because—and to the best of our knowledge—they are the most recent monolingual WSD systems, which are interested to French language and were assessed using the same ROMANSEVAL test collection. For example, we cannot compare to Brun et al. [28], because they used “Le monde 94” test collection to assess their results. The systems studied in [91] were developed, respectively, by EPFL (Ecole Polytechnique Fédérale de Lausanne), IRISA (Institut de recherche en informatique et Systèmes Aléatoire, Rennes), LIA-BERTIN (Laboratoire d’informatique, Université d’Avignon, and BERTIN, Paris), and XEROX (Xerox Research Centre Europe, Grenoble). A comparative study between these systems is available at [91]. Figure 8 shows the values of Kappa metrics for these five systems and our POSS and PROBA approaches. For adjectives, the POSS approach is better than all other five systems and the PROBA approach. But, for nouns, the PROBA approach is the first one before the POSS approach, which is better than all other five systems. For verbs, POSS approach is better than EPFL, IRISA, PROBA, and Xerox but less significant than LIA1 and LIA2. When focusing on the global results over all parts of speech (cf. Fig. 9), our possibilistic WSD approach (POSS) is distinguished from other five systems and the PROBA approach for the Kappa value (EPFL: 0.29; IRISA: 0.27; LIA1: 0.28; LIA2: 0.27; Xerox: 0.29; PROBA: 0.41). In fact, agreement between our system and other judges is not a stroke of chance according to a moderate Kappa value (POSS: 0.47). We should there notice that disagreement among the human judges who prepared sense tagging of the ROMANSEVAL benchmark is so important according to Véronis [101]: Kappa ranges between 0.92 (noun “detention”) and 0.007 (adjective “correct”). In other terms, there is no more agreement than chance for some words. If human annotators do not agree much

123


Adjectives

Nouns

POSS

PROBA

LIA2

XEROX

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

LIA1

POSS

PROBA

LIA2

XEROX

LIA1

IRISA

EPFL

PROBA

POSS

LIA2

XEROX

LIA1

EPFL

IRISA

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

IRISA

Kappa

Kappa

1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

EPFL

Kappa

Verbs

Fig. 8 Mean Kappa results by part of speech

Kappa 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 EPFL

IRISA

LIA1

LIA2

XEROX

POSS

PROBA

All POS Fig. 9 Mean Kappa results for all parts of speech (POS)

more than chance on many words, it seems that systems that produce random sense tags for these words should be considered as satisfactory. This phenomenon is well known in WSD state-of-the-art studies, which remarked that mostly due to the fact that human annotators also tend to disagree [104]. To reinforce our comparative study between the new possibilistic WSD approach and both Xerox and probabilistic approaches in terms of mean Kappa results, we use the Wilcoxon matched-pairs signed-ranks test as proposed by Demsar [36]. It is a nonparametric alternative to the paired t test that enables us to compare two approaches (POSS versus Xerox and POSS versus PROBA) over every part of speech (adjectives, nouns, and verbs) and over all POS. The given values (p value) are computed by comparing the mean Kappa of the possibilistic approaches to each from the other Xerox and probabilistic approaches. Comparison results given in Table 3 show that the possibilistic approach is always significantly better than the Xerox approach for every POS (p value < 0.05). Especially, POSS is strongly better than Xerox for all POS because the p value = 0.000007 is very smaller comparing to 0.05. In addition, POSS seems better than PROBA for adjectives and verbs, but it is not the same thing for nouns ( p value = 0.184992 > 0.05), which are the most polysemy POS in French language [101]. However, results are almost significant when we compare POSS to PROBA for all POS ( p value = 0.052847), which is influenced by the nonsignificant results for nouns. In fact, the above detailed quantitative assessment of our WSD approaches using Kappa metric still need to be reinforced in terms of using others evaluation metrics such as recall,

123

Author's personal copy B. Elayeb et al. Table 3 The p value results for the Wilcoxon matched-pairs signed-ranks test Mean Kappa

POSS versus Xerox

POSS versus PROBA

Adjectives

0.004550

0.033340

Nouns

0.011220

0.184992

Verbs

0.016852

0.019569

All POS

0.000007

0.052847

Table 4 Results using recall, precision, and F1 metrics for A, N, and V Adjectives (A) Recall

Precision

Nouns (N) F1

Recall

Precision

Verbs (V) F1

Recall

Precision

F1

EPFL

0.54

0.56

0.549

0.51

0.52

0.514

0.40

0.39

0.394

IRISA

0.69

0.61

0.647

0.55

0.48

0.512

0.29

0.28

0.284

LIA1

0.00

0.00

–

0.75

0.64

0.690

0.88

0.71

0.785

LIA2

0.00

0.00

–

0.76

0.63

0.688

0.89

0.72

0.796

Xerox

0.56

0.48

0.516

0.45

0.43

0.439

0.31

0.29

0.299

POSS

0.54

0.57

0.554

0.63

0.66

0.644

0.44

0.47

0.454

PROBA

0.49

0.53

0.509

0.59

0.63

0.609

0.40

0.44

0.419

Table 5 Results using recall, precision, and F1 metrics for all POS POSS

All POS

PROBA

Recall

Precision

F1

Recall

Precision

F1

0.543

0.570

0.556

0.507

0.546

0.526

precision, and F-measure, generally recommended in IR evaluation process. Indeed, the combination of recall and precision was initially used as the main metrics of performance assessment in the SensEval/SemEval exercises. Table 4 gives a comparative study of our results using recall, precision, and F1 metrics for adjectives (A), nouns (N), and verbs (V) between our two approaches and the five state-of-the-art monolingual WSD systems existing in [91]. Besides, Table 5 gives our results for POSS and PROBA approaches for All POS. Results in Table 4 show that IRISA performs the best scores for adjectives just before the possibilistic approach. However, LIA1 and LIA2 systems were not efficient for adjectives with null scores. By cons, LIA1 and LIA2 have best scores for nouns and verbs just before the possibilistic approach, which seems better than all other four systems. On the other side, and using recall, precision, and F1 metrics, the possibilistic approach performs better results for all POS than the probabilistic one (cf. Table 5). In fact, Kappa metric cannot alone reflect the real performance of the WSD approaches, because it is useful in the accuracy normalization, correcting the result for the estimated agreement with the ideal classifier by chance [34]. These metrics complete each one the others in order to achieve an objective evaluation of WSD approaches. That is why we propose to use all these metrics in the assessment of our WSD approaches. On the other hand, recent approaches in the field of WSD have interested to new techniques called “Cross-Lingual WSD” (CLWSD) since SemEval-2010 [70] and SemEval-2013 [71].

123


In fact, the decision to revise the more traditional monolingual WSD task into a cross-lingual WSD task was justified by several arguments detailed in [71]: 1. There is no need for sense inventories such as WordNet [53] or EuroWordNet [108] neither for manually created sense-tagged corpora when we take advantage of translations as sense labels. Consequently, there will be a clearing process of data acquisition bottleneck for WSD thanks to the use of multilingual unlabeled parallel corpora. 2. In the case when senses get lexicalized in different translations of the word, advanced sense distinctions will be significant. Indeed, a cross-lingual method also contributes to the solving of the sense granularity problem. 3. The need to abstract sense labels to equivalent translations will be avoided when we use directly translations rather than more abstract sense labels. Therefore, real multilingual applications such as IR [32] or machine translation [29] will easily take advantage of the integration of such dedicated WSD component. For the system assessment results, participants in these two last SemEval competitions (2010 and 2013) have used as metrics the precision (Prec), the recall (Rec), the mode precision (MP), and the mode recall (MR). In SemEval-2010 [70], the best system results for French language are performed by the team “T3-COLEUR,” which have Prec = 21.96; Rec = 21.73; MP = 16.15; and MR = 15.93. However, if we look at the Out-of-five system results, the team “T3-COLEUR” stays the best one with Prec = 49.44; Rec = 48.96; MP = 42.13; and MR = 41.77. Recently, in SemEval-2013 [71], the BEST precision scores averaged over all twenty test words for French is 30.11 performed by the team “cllN” (WSD2 system). But the BEST mode precision (MP) is done by the team “cll” (WSD2 system) with 26.62. Besides, the Out-of-five precision score is done also by the “cllN” with 59.80, but the Outof-five mode precision is performed by the “cll” with 57.57. In fact, these results cannot be relevant for our comparative study proposed in this paper due to some reasons such as (i) “Monolingual WSD” cannot be directly compared to “Cross-lingual WSD,” but we plan to extend our possibilistic and probabilistic monolingual WSD approaches to a cross-lingual framework; (ii) we used in our evaluation the standard ROMANSEVAL test collection, which is different to those used in the cross-lingual WSD assessments (Europarl, JRC-ACQUIS3 , BNC4 , ANC,5 , etc.). Consequently, a real comparison of effectiveness cannot be performed using different standard and datasets. Moreover, several approaches have previously proved the validity of a cross-lingual approach to WSD [5,25,31,37,54,70,71,81,95,99]. The cross-lingual WSD task improves this research area by the building of a devoted standard datasets where the ambiguous words were annotated with the senses from a multilingual sense inventory generated from a parallel corpus. These standard datasets permit an exhaustive comparative study between diverse methods and the CLWSD task. Finally, we plan to extend our monolingual possibilistic and probabilistic WSD approaches to cross-lingual ones using in our assessment the same resources as used in SemEval-2013 in order to compare our effectiveness to the recent existing CLWSD systems such as [70,71]. The developed tool will be useful in cross-language information retrieval (CLIR) system where disambiguation task remains the major challenge in any query translation and query expansion processes. Despite these theoretical and experimental studies proposed in this paper, a profound qualitative study seems to be relevant in order to find answers to some non-answered questions 3 http://wt.jrc.it/lt/Acquis/. 4 http://www.natcorp.ox.ac.uk/. 5 http://americannationalcorpus.org/.

123


related to the field of WSD such as (i) why WSD systems still suffered from some difficult polysemy words? (ii) are these difficult words related to a particular POS, and why they still difficult to these WSD systems? (iii) what is the impact of sense mapping? (iv) what is the impact of the evaluation metrics to provide an objective assessment of WSD systems? (v) how can we choose the suitable metrics? (vi) how can we choose the suitable WSD benchmark and what criteria we must take into account? (vii) at what level “Cross-Lingual WSD” technique can replace “Monolingual WSD” one, given the challenging problems related to translation task? We mainly invite WSD researchers to contribute in these processes.

8 Conclusion and future work This paper tried to give some contributions in the field of monolingual WSD, which is considered as one of the most difficult tasks in the domain of semantic treatment [78]. Indeed, we proposed, evaluated, and compared possibilistic versus probabilistic approaches for automatic WSD applied for French texts. These approaches combine traditional dictionaries and labeled corpora to build a semantic dictionary of contexts (SDC). First, we used a possibilistic network in order to quantify the relevance of a word sense given a polysemous sentence. This relevance is modeled by a double measurement. The possible relevance makes it possible to reject the irrelevant word senses given a polysemous sentence, whereas the necessary relevance makes it possible to reinforce the senses not eliminated by the possibility. Second, we exploited an existing probabilistic distance to propose a new probabilistic approach, which calculates a semantic distance between the words of the dictionary by keeping an account of the complete topology of the dictionary, seem like a semantic graph on its entries. In order to assess and compare our two approaches with similar monolingual WSD systems, we performed experiments on the standard ROMANSEVAL test collection. We summarized and discussed the results of the various performed tests in a detailed and global analysis ways. Experiments using Kappa metric showed an encouraging improvement in terms of disambiguation rates of French words. The possibilistic WSD performed better than Xerox for adjectives, nouns, verbs, and all POS. But the probabilistic WSD seems better than the possibilistic and Xerox for nouns. This may explain why some researchers tried to develop specific approaches and labeled corpora (e.g., Brown et al. [26]) for WSD. For all POS, the possibilistic seems better than all others approaches. These results reveal the contribution of possibilistic theory, as it provided good accuracy rates in this first experiment. Experiments using recall, precision, and F1 metrics confirmed that the possibilistic approach globally performs better results for adjectives, nouns, verbs, and all POS than Xerox and the PROBA approaches. It seems also better than all other state-of-the-art monolingual WSD systems, because some of them performed sometime good results for any POS, but at the same time they have very bad results for others POS. Besides, the possibilistic approach is finer than the probabilistic one and the other stateof-the-art monolingual WSD systems in these experiments with ROMANSEVAL. This explained by the fact that possibility and necessity measures increase the relevance of correct senses and penalizes the scores of the remaining ones. In fact, the penalization and the increase in the scores are proportional to the capacity of the polysemous words to discriminate between the various senses of the collection [49]. However, the possibilistic matching model does not take into account relations between the words of the context in the used possibilistic network (cf. Sect. 5.1). Many solutions are to be studied such as (i) to ameliorate this model in order to consider these types of relations; or (ii) to combine this model with a probabilistic

123


distance such as PROX. We already made a first experiment for combining possibilistic and probabilistic approaches in [48]. On the other hand, the semantic dictionary of contexts was used like a second entry for the WSD approaches with the traditional dictionary. We showed that the use of the SDC would make it possible to improve the results of word sense calculation in a given context and to obtain good performances comparing to the existing monolingual WSD systems. It also allows to analyze WSD results, because it represents explicitly contextual information. Thus, we are able to “debug” decisions made by WSD systems. In order to decrease the size of semantic space and consequently to improve the response time of our WSD platform, we intend to propose a function allowing to determine the semantic space of a polysemous word (i.e., the size of the contextual window) in a dynamic way according to the SDC. Another prospect is related to the integration of this WSD platform in some intelligent possibilistic Web information retrieval systems (IRS) such us SARIPOD [47], SPORSER [48], and SPORT [19]. As recent research shown the role of WSD in IR (e.g., Soto et al. [94], Barathi and Valli [14]), we recently evaluate the impact of the query disambiguation process on the performance of these IRS [17]. Indeed, we are taking advantage in this work of our monolingual WSD platform in order to improve query expansion by automatic query disambiguation in intelligent IR. On the other hand, the short-term goal of our work is to improve the performance of a cross-lingual IRS by introducing a step of queries and documents disambiguation in a cross-lingual context. Thus, this work will be wide toward other languages such as Arabic and English. For English, we can use the freely available dataset in [66]. Nevertheless, the task is more complex for Arabic WSD, because we need to structure brut Arabic dictionaries and to build/find a relevant standard test collection. Some existent works reveal promising solutions (e.g., Khemakhem et al. [64]) for structuring Arabic dictionaries and [121] for Arabic WSD). Moreover, our tools and data structures are reusable components that may be integrated in other fields such as information extraction, machine translation, content analysis, word processing, lexicography, and the semantic Web applications. Acknowledgments We are grateful to the anonymous reviewers whose relevant comments and convincing remarks helped us improve the paper.

References 1. Agirre E, Edmonds P (ed) (2006) Word sense disambiguation. algorithms and applications (Text, Speech and Language Technology). Springer, Dordrecht 2. Agirre E, Martinez D (2000) Exploring automatic word sense disambiguation with decision lists and the Web. In: Buitelaar P, Hasida K (eds) Proceedings of the COLING 2000 workshop on semantic annotation and intelligent content. International Committee on Computational Linguistics, Luxembourg, pp 11–19 3. Agirre E, Soroa A (2009) Personalizing PageRank for word sense disambiguation. In: Lascarides A, Gardent C, Nivre J (eds) Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics. The Association for Computer Linguistics, Athens, Greece, pp 33–41 4. Agirre E, Lopez de Lacalle O, Soroa A (2009) Knowledge-based WSD and specific domains: performing better than generic supervised WSD. In: Boutilier C (ed) Proceedings of the 21st international joint conference on artificial intelligence. Pasadena, California, USA, pp 1501–1506 5. Apidianaki M (2009) Data-driven semantic analysis for multilingual WSD and lexical selection in translation. In: Lascarides A, Gardent C, Nivre J (eds) Proceedings of the 12th conference of the European chapter of the Association for Computational Linguistics. The Association for Computer Linguistics, Athens, Greece, pp 77–85 6. Audibert L (2002) Etude des critères de désambiguïsation sémantique automatique: Présentation et premiers résultats sur les cooccurrences. TALN-RECITAL-2002. Sixième Rencontre des Etudiants

123


7. 8.

9.

10.

11.

12.

13.

14. 15.

16.

17.

18.

19.

20.

21. 22. 23. 24. 25.

Chercheurs en Informatique pour le Traitement Automatique des Langues. Association pour le Traitement Automatique des Langues, Nancy, France, pp 415–424 Audibert L (2003a) Outils d’exploration de corpus et désambiguïsation lexicale automatique. Ph.D. Thesis, Université d’Aix-Marseille I - Université de Provence, France Audibert L (2003b) Etudes des critères de désambiguïsation sémantique automatique : résultat sur les cooccurrences. In : TALN-2003, Actes de la conférence Traitement Automatique des Langues. Association pour le Traitement Automatique des Langues, Batz-sur-Mer, France, pp 35–44 Ayed R, Bounhas I, Elayeb B, Evrard F, Bellamine Ben Saoud N (2012a) Arabic morphological analysis and disambiguation using a possibilistic classifier. In: Huang D, Ma J, Jo K-H et al (eds) Intelligent computing theories and applications—8th international conference. Springer, Berlin, Heidelberg, LNAI 7390, Huangshan, China, pp 274–279 Ayed R, Bounhas I, Elayeb B, Evrard F, Bellamine Ben Saoud N (2012b) A possibilistic approach for the automatic morphological disambiguation of Arabic texts. In: Hochin T, Lee R. Y (eds) Proceedings of the 13th ACIS international conference on software engineering, artificial intelligence, networking and parallel/distributed computing. IEEE Computer Society, Kyoto, Japan, pp 187–194. Ayed R, Bounhas I, Elayeb B, Bellamine Ben Saoud N, Evrard F (2014a) Evaluation d’une approche possibiliste pour la désambiguïsation des textes arabes. In: TALN-2014, Actes de la conférence Traitement Automatique des Langues. Association pour le, Traitement Automatique des Langues, 1–4 July 2014, Marseille, France Ayed R, Bounhas I, Elayeb B, Bellamine Ben Saoud N, Evrard F (2014b) Improving Arabic Texts morphological disambiguation using possibilistic classifier. In: Proceedings of the 19th international conference on application of natural language to information systems. Springer, Berlin, Germany. 18– 20 June 2014, Montpellier, France Baldwin T, Su NK, Bond F, Fujita S, Martinez D, Tanaka T (2008) MRD-based word sense disambiguation: further extending lesk. In: IJCNLP 2008, proceedings of the 3rd international joint conference on natural language processing. The Association for Computer Linguistics, Hyderabad, India, pp 775–780 Barathi M, Valli S (2010) Ontology based query expansion using word sense disambiguation. Int J Comput Sci Inf Secur 7(2):22–27 Barque L, Chaumartin F R (2008) La polysémie régulière dans WordNet. In: TALN-2008, Actes de la conférence Traitement Automatique des Langues. Association pour le Traitement Automatique des Langues, Avignon, France. http://www.atala.org/taln_archives/TALN/TALN-2008/taln-2008-long-021.pdf Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2012) A possibilistic approach for automatic word sense disambiguation. In: Proceedings of the 24th conference on computational linguistics and speech processing. Association for Computational Linguistics and Chinese Language Processing, Chung-Li, Taiwan, pp 261–275 Ben Khiroun O, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2014) Improving query expansion by automatic query disambiguation in intelligent information retrieval. In: Filipe J, Fred ALN (eds) Proceedings of the 6th international conference on agents and artificial intelligence. SciTePress, Angers, Loire Valley, France, pp 153–160 Benferhat S, Dubois D, Garcia L, Prade H (1999) Possibilistic logic bases and possibilistic graphs. In: Laskey KB, Prade H (eds) Proceedings of the fifteenth conference on uncertainty in artificial intelligence. Morgan Kaufmann, Stockholm, Sweeden, pp 57–64 Ben Romdhane W, Elayeb B, Bounhas I, Evrard F, Bellamine Ben Saoud N (2013) A possibilistic query translation approach for cross-language information retrieval. In: Huang D, Jo K-H, Zhou Y-Q et al (eds) Intelligent computing theories and technology—9th international conference. Springer, Berlin, LNCS 7996, Nanning, China, pp 73–82 Bookman LA (1987) A microfeature-based scheme for modelling semantics. In: McDermott JP (ed) Proceedings of the 10th international joint conference on artificial intelligence. Morgan Kaufmann, Milan, Italy, pp 611–614 Borgelt C, Gebhardt J, Kruse R (2000) Possibilistic Graphical Models. Computational intelligence in data mining, CISM courses and lectures 408:51–68 Boughanem M, Brini A, Dubois D (2009) Possibilistic networks for information retrieval. Int J Approx Reason 50(7):957–968 Bounhas M, Mellouli K, Prade H, Serrurier M (2013) Possibilistic Classifiers for numerical data. Soft Comput 17(5):733–751 Bounhas M, Ghasemi MH, Prade H, Serrurier M, Mellouli K (2014) Naïve possibilistic classifiers for imprecise or uncertain numerical data. Fuzzy Set Syst 239:137–156 Brown PF, Pietra SAD, Pietra VJD, Mercer RL (1991) Word-sense disambiguation using statistical methods. In: Proceedings of the 29th annual meeting of the Association for Computational Linguistics. The Association for Computational Linguistics, Berkeley, California, USA, pp 264–270

123

Author's personal copy Monolingual word sense disambiguation 26. Brown SW, Dligach D, Palmer M (2011) Verbnet class assignment as a WSD task. In: Proceedings of the 9th international conference on computational semantics. The Association for Computational Linguistics, Stroudsburg, PA, USA, pp 85–94 27. Brun C (2000) A client/server architecture for word sense disambiguation. In: Proceedings of the 18th international conference on Computational Linguistics. Morgan Kaufmann, Saarbrücken, Germany, pp 132–138 28. Brun C, Jacquemin B, Segond F (2001) Exploitation de dictionnaires électroniques pour la désambiguïsation sémantique lexicale. Traitement Automatique de la Langue 42(3):667–691 29. Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: EMNLP-CoNLL 2007, Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning. Prague, Czech Republic, The Association for Computational Linguistics, pp 61–72 30. Chan YS, Ng HT, Chiang D (2007) Word sense disambiguation improves statistical machine translation. In: Carroll JA, Bosch AVD, Zaenen A (eds) Proceedings of the 45th annual meeting of the Association for Computational Linguistics. Czech Republic, The Association for Computational Linguistics, Prague, pp 33–40 31. Chan YS, Ng HT (2005) Scaling up word sense disambiguation via parallel texts. In: Veloso MM, Kambhampati S (eds) Proceedings of The 20th national conference on artificial intelligence and the seventeenth innovative applications of artificial intelligence conference. AAAI Press/The MIT Press, Pittsburgh, Pennsylvania, USA, pp 1037–1042 32. Clough P, Stevenson M (2004) Cross-language information retrieval using EuroWordNet and word sense disambiguation. In: McDonald S, Tait J (eds) Advances in information retrieval, 26th European conference on IR research. Springer, LNCS 2997, Sunderland, UK, pp 327–337 33. Cohen J (1968) Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychol Bull 70(4):213–220 34. Cohn T (2003) Performance metrics for word sense disambiguation. In: Proceedings of the Australasian language technology workshop. The Australasian Language Technology Association, Melbourne, Australia, pp 86–93 35. Dahlgren K (ed) (1988) Naive semantics for natural language understanding. Kluwer, Boston 36. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30 37. Diab M (2004) Word sense disambiguation within a multilingual framework. Ph.D. thesis, University of Maryland, USA 38. Doddington GR, Mitchell A, Przybocki MA, Ramshaw LA, Strassel S, Weischedel RM (2004) The Automatic Content Extraction (ACE) program—tasks, data, and evaluation. Proceedings of the 4th international conference on language resources and evaluation. European Language Resources Association, Lisbonne, Portugale, pp 837–840 39. Dubois D, Prade H (eds) (1988) Possibility theory: an approach to computerized processing. Plenum Press, New York 40. Dubois D, Prade H (2006) Représentations formelles de l’incertain et de l’imprécis. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds) Concepts et méthodes pour l’aide à la décision—outils de modélisation. Lavoisier, Paris, pp 111–171 41. Dubois D, Prade H (2009) Formal representations of uncertainty. In: Bouyssou D, Dubois D, Pirlot M, Prade H (eds) Decision-making process: concepts and methods. Wiley-ISTE, Hoboken 42. Dubois D, Prade H (eds) (1987) Théorie des possibilités: application à la représentation des connaissances en, Informatique edn. MASSON, Paris 43. Dubois D, Prade H (eds) (1988) Possibility theory. Plenum Press, New York 44. Dubois D, Prade H (1998) Possibility theory: qualitative and quantitative aspects. In: Gabbay DM, Smets P (eds) Quantified representation of uncertainty and imprecision. Klower, The Netherlands, Handbook of Defeasible Reasoning and Uncertainty Management Systems, pp 169–226 45. Dubois D, Prade H (2000) An overview of ordinal and numerical approaches to causal diagnostic problem solving. In: Gabbay DM, Kruse R (eds) Abductive reasoning and learning. Handbooks of defeasible reasoning and uncertainty management systems, Klower, The Netherlands, pp 231–280 46. Edmonds P, Hirst G (2002) Near-synonymy and lexical choice. Comput Linguist 28(2):105–144 47. Elayeb B (2009) SARIPOD: Système multi-agent de recherche intelligente possibiliste des documents web. Ph.D. thesis, The National Polytechnic Institute of Toulouse, France—The National School of Computer Science, Manouba University, Tunisia 48. Elayeb B, Bounhas I, Ben Khiroun O, Evrard F, Bellamine Ben Saoud N (2011) Towards a possibilistic information retrieval system using semantic query expansion. Int J Intell Inf Technol 7(4):1–25 49. Elayeb B, Evrard F, Zaghdoud M, Ben Ahmed M (2009) Towards an intelligent possibilistic web information retrieval using multiagent system. Int Technol Smart Educ 6(1):40–59

123

Author's personal copy B. Elayeb et al. 50. Erk K, Strapparava C (eds) (2010) Proceedings of the 5th international workshop on semantic evaluation. The Association for Computational Linguistics, Uppsala, Sweden 51. Eugenio BD (2000) On the usage of Kappa to evaluate agreement on coding tasks. In: LREC 2000— Proceedings of the second international conference on language resources and evaluation. European Language Resources Association, Athens, Greece, pp 441–444 52. Faralli S, Navigli R (2012) A new minimally-supervised framework for domain word sense disambiguation. In: EMNLP-CoNLL 2012—Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. The Association for Computational Linguistics, Jeju Island, Korea, pp 1411–1422 53. Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, Cambridge 54. Gale WA, Church KW (1993) A program for aligning sentences in bilingual corpora. Comput Linguist 19(1):75–102 55. Gaume B (2004) Balades aléatoires dans les petits mondes lexicaux. Inf Int Intell 4(2):39–96 56. Gaume B (2006) Cartographier la forme du sens dans les petits mondes Lexicaux. In: Viprey J-M (ed) Proceedings of the 8th international conference on textual data and statistical analysis. Presses Universitaires de Franche-Comté, Besançon, France, pp 541–465 57. Gaume B, Hathout N, Muller P (2004) Word sense disambiguation using a dictionary for sens similarity measure. COLING 2004—Proceedings of the 20th International conference on computational linguistics. The Association for Computational Linguistics, Geneva, Switzerland, pp 1194–1200 58. Grishman R, Sundheim B (1996) Message understanding conference 6—a brief history. COLING 1996— Proceedinds of the 16th international conference on computational linguistics. The Association for Computational Linguistics, Copenhagen, Denmark pp 466–471 59. Guthrie J, Guthrie L, Wilks Y, Aidinejad H (1991) Subject-dependent cooccurrence and word sense disambiguation. In: Proceedings of the 29th annual meeting of the Association for Computational Linguistics. The Association for Computational Linguistics, Berkeley, California, USA pp 146–152 60. Haouari B, Ben Amor N, Elouedi Z, Mellouli K (2009) Naïve possibilistic network classifiers. Fuzzy Sets Syst 160(22):3224–3238 61. Ide N, Véronis J (1998) Word sense disambiguation: the state of the art. Comput Linguist 24:1–40 62. Jaynes ETh (ed) (2003) Probability theory: the logic of science. Cambridge University Press, Cambridge 63. Jimeno-Yepes AJ, McInnes BT, Aronson AR (2011) Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation. BMC Bioinform 12:223 64. Khemakhem A, Gargouri B and Ben Hamadou A (2013) Collaborative enrichment of electronic dictionaries standardized-LMF. In: Métais E, Meziane F, Saraee M et al (eds) Natural language processing and information systems—18th international conference on applications of natural language to information systems. Springer, LNCS 7934, Salford, UK, pp 328–336 65. Kilgarriff A (1994) The myth of completness and some problems with consistency (the role of frequency in deciding what goes in the dictionary). In: Martin W, Meijs W, Moerland M et al (eds) Proceedings of The 6th EURALEX international congress on lexicography. European Association for Lexicography, Amsterdam, The Netherlands pp 101–106 66. Koeling R, McCarthy D, Carroll J (2005) Domain-specific sense distributions and predominant sense acquisition. In: Proceedings of the HLT/EMNLP 2005—human language technology conference and conference on empirical methods in natural language processing. The Association for Computational Linguistics, Vancouver, British Columbia, Canada, pp 419–426 67. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the fourteenth international joint conference on artificial intelligence. Morgan Kaufmann, Montréal Québec, Canada, pp 1137–1143 68. Krippendorff K (ed) (1980) Content analysis: an introduction to its methodology. Sage Publications, Beverly Hills 69. Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174 70. Lefever E, Hoste V (2010) SemEval-2010 Task 3: cross-lingual word sense disambiguation. In: Proceedings of the 5th international workshop on semantic evaluation. The Association for Computational Linguistics, Uppsala, Sweden, pp 15–20 71. Lefever E, Hoste V (2013) SemEval-2013 Task 10: cross-lingual word sense disambiguation. In: Second joint conference on lexical and computational semantics, Volume 2: Proceedings of the 7th international workshop on semantic evaluation. The Association for Computational Linguistics, Atlanta, Georgia, USA, pp 158–166 72. Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on Systems documentation. ACM, Toronto, Ontario, Canada, pp 24–26

123

Author's personal copy Monolingual word sense disambiguation 73. Loper E, Yi S-T, Palmer M (2007) Combining lexical resources: mapping between PropBank and VerbNet. In: Proceedings of the 7th international workshop on computational semantics. Springer, Tlburg, The Netherlands. http://verbs.colorado.edu/~kipper/Papers/semlink_iwcs7.pdf 74. Loupy C (2000) Evaluation de l’apport de connaissances linguistiques en désambiguïsation sémantique et recherche documentaire. Ph.D. thesis, Université d’Avignon, Avignon, France 75. Masterman M (1961) Semantic message detection for machine translation, using an interlangua. In: Proceedings of the international conference on machine translation of languages and applied language analysis. National Physical Laboratory, Teddington, UK, pp 438–474 76. McRoy S (1992) Using multiple knowledge sources for word sense discrimination. Comput Linguist 18:1–30 77. Mihalcea R, Moldovan D (1998) Word sense disambiguation based on semantic density. In: Proceedings of Coling-ACL’98 workshop—usage of wordnet in natural language processing systems. Montreal, Quebec, Canada, pp 16–22 78. Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):1–69 79. Navigli R, Lapata M (2007) Graph connectivity measures for unsupervised word sense disambiguation. In: Veloso MM (ed) Proceedings of the 20th international joint conference on artificial intelligence. Hyderabad, India, pp 1683–1688 80. Navigli R, Ponzetto SP (2012) Joining forces pays off: multilingual joint word sense disambiguation. EMNLP-CoNLL 2012—Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning. The Association for Computational Linguistics, Jeju Island, Korea, pp 1399–1410 81. Ng HT, Wang B, Chan YS (2003) Exploiting parallel texts for word sense disambiguation: an empirical study. In: Proceedings of the 41st annual meeting of the association for computational linguistics. The Association for Computational Linguistics, Sapporo, Japan, pp 455–462 82. Nguyen K-H, Ock Ch-Y (2013) Word sense disambiguation as a traveling salesman problem. Artif Intell Rev 40(4):405–427 83. Ploux S, Victorri B (1998) Construction d’espaces sémantiques à l’aide de dictionnaires de synonymes. Traitement automatique des langues 39(1):161–182 84. Pilehvar MT, Jurgens D, Navigli R (2013) Align, disambiguate and walk: a unified approach for measuring semantic similarity. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics. The Association for Computational Linguistics, Sofia, Bulgaria, pp 1341–1351 85. Ponzetto SP, Navigli R (2010) Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics. The Association for Computational Linguistics, Uppsala, Sweden, pp 1522–1531 86. Resnik P (1995) Disambiguating noun groupings with repect to WordNet senses. In: Yarowsky D, Church K (eds) Proceedings of the third workshop on very large Corpora. The Association for Computational Linguistics, Cambridge, pp 54–68 87. Reymond D (2001) Dictionnaires distributionnels et étiquetage lexical de corpus. In: TALN-RECITAL2001. Cinquième Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues. Association pour le Traitement Automatique des Langues, Tours, France pp 473–482 88. Reymond D (2002) Méthodologie pour la création d’un dictionnaire distributionnel dans une perspective d’étiquetage lexical semi-automatique. In: TALN-RECITAL-2002. Sixième Rencontre des Etudiants Chercheurs en Informatique pour le Traitement Automatique des Langues. Association pour le Traitement Automatique des Langues, Nancy, France, pp 405–414 89. Rijsbergen CJ, Keith V (eds) (2004) The geometry of information retrieval. Cambridge University Press, Cambridge 90. Roche E, Schabes Y (eds) (1997) Finite-state language processing. MIT Press, Cambridge 91. Segond F (2000) Framework and results for French. Comput Humanit 34(1):49–60 92. Sinha R, Mihalcea R (2007) Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In: Proceedings of the first IEEE international conference on semantic computing. IEEE Computer Society, Irvine, CA, USA pp 363–369 93. Small S, Rieger C (1982) Parsing and comprehencing with word experts (a theory and its realization). In: Wendy L, Martin R (eds) Strategies for natural language processing. Lawrence Erlbaum Associates, Hillsdale, NJ, USA, pp 89–147 94. Soto A, Olivas JA, Prieto ME (2008) Fuzzy approach of synonymy and polysemy for information retrieval. In: Bello R et al (eds) Granular computing: at the junction of rough sets and fuzzy sets studies in fuzziness and soft computing, vol 224, pp 179–198 95. Specia L, Nunes MGV, Stevenson M (2007), Learning expressive models for word sense disambiguation. In: Proceedings of the 45th Annual meeting of the association of computational linguistics. Prague, Czech Republic, The Association of, Computational Linguistics, pp 41–48

123

Author's personal copy B. Elayeb et al. 96. Stevenson M, Wilks Y (2001) The interaction of knowledge sources in word sense disambiguation. Comput Linguist 27(3):321–349 97. Tae-Gil N, Park S-B, Lee S-J (2010) Unsupervised word sense disambiguation in biomedical texts with co-occurrence network and graph kernel. In: Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics. ACM, Toronto, Canada, pp 61–64 98. Tlili-Guiassa Y, Merouani HF (2007) Désambiguïsation sémantique d’un texte Arabe. In: Proceedings of Séminaire national sur le langage naturel et l’intelligence artificielle. Chlef, Algérie, pp 41–60 99. Tufis D, Ion R, Ide N (2004) Fine-grained word sense disambiguation based on parallel Corpora, Word Alignment, Word Clustering and Aligned WordNets. In: COLING 2004—Proceedings of the 20 th international conference on computational linguistics. The Association for Computational Linguistics, Geneva, Switzerland, pp 1312–1318 100. Véronis J, Ide N (1990) Word sense disambiguation with very large neural networks extracted from machine readable dictionaries. In: COLING 1990—Proceedings of the 13th international conference on computational linguistics. The Association for Computational Linguistics, Helsinki, Finland, pp 389–394 101. Véronis J (1998) A study of polysemy judgements and inter-annotator agreement. In: Programme and advanced papers of the Senseval workshop. Herstmonceux Castle, England, pp 2–4 102. Véronis J (2003a) Sense tagging: does it make sense? In: Rayson P, Wilson A, McEnery T et al (eds.) Proceedings of the Corpus Linguistics 2001 conference. Peter Lang Frankfurt, Lancaster, UK. http:// sites.univ-provence.fr/veronis/pdf/2001-lancaster-sense.pdf 103. Véronis J (2003) Hyperlex: cartographie lexicale pour la recherche d’informations. In: TALN-2003. Actes de la conférence Traitement Automatique des Langues. Association pour le Traitement Automatique des Langues, Batz-sur-Mer, France, pp 265–274 104. Vidhu Bhala R V and Abirami S (2012) Trends in word sense disambiguation. Artif Intell Rev, pp 1–13. doi:10.1007/s10462-012-9331-5 105. Viera AJ, Garrett JM (2005) Understanding interobserver agreement: the kappa statistic. Family Med 37(5):360–363 106. Voorhees EM (2004) Overview of the TREC-2004 question answering track. In: Proceedings of 13th text REtrieval conference. NIST Special Publication, pp 500–261, Gaithersburg, USA. http://trec.nist. gov/pubs/trec13/papers/QA.OVERVIEW.pdf 107. Voorhees EM, Buckland LP (eds) (2006) Proceedings of the fifteenth text retrieval conference. NIST Special Publication 500-272, Gaithersburg, USA 108. Vossen P (ed) (1998) EuroWordNet: a multilingual database with lexical semantic networks. Kluwer, Norwell 109. Waltz D, Pollack J (1985) Massively parallel parsing: a strongly interactive model of natural language interpretation. Cogn Sci 9(1):51–74 110. Weiss SF (1973) Learning to disambiguate. Inf Storage Retr 9(1):33–41 111. Wilks YA, Stevenson M (1997a) Combining independent knowledge source for word sense disambiguation. In: Proceedings of conference of recent advances in natural language processing. Tzigov Chark, Bulgaria, pp 1–7 112. Wilks YA, Stevenson M (1997b) The grammar of sense: using part-of-speech tags as first step in semantic disambiguation. J Nat Lang Eng 4(3):135–143 113. Wilks Y (1975) Preference semantics. In: Keenan EL (ed) Formal Semantics of Natural Language. Cambridge University Press, Cambridge, pp 329–348 114. Wilks YA, Fass DC, Guo CM, MacDonald JE, Plate T, Slator BA (eds) (1990) Providing machine tractable dictionary tools. MIT Press, Cambridge 115. Yarowsky D (2000) Hierarchical decision list for word sense disambiguation. Comput Humanit 34(1– 2):179–186 116. Yarowsky D, Cucerzan S, Florian R, Schafer C, Wicentowski R (2001) The johns hopkins SENSEVAL2 system descriptions. In: Preiss J, Yarowsky D (eds) Proceedings of The 2nd international workshop on evaluating word sense disambiguation systems. The Association for Computational Linguistics, Toulouse, France, pp 163–166 117. Yuret D, Yatbaz MA (2010) The noisy channel model for unsupervised word sense disambiguation. Comput Linguist 36(1):111–127 118. Zadeh LA (1978) Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst 1(1):3–28 119. Zhong Z, Ng HT (2010) It makes sense: a wide-coverage word sense disambiguation system for free text. In: Proceedings of the 48th annual meeting of the association for computational linguistics. The Association for Computational Linguistics, Uppsala, Sweden, pp 78–83

123

Author's personal copy Monolingual word sense disambiguation 120. Zhou X, Han H (2005) Survey of word sense disambiguation approaches. In: Russell I, Markov Z (eds) Proceedings of the eighteenth international florida artificial intelligence research society conference. AAAI Press, Clearwater Beach, pp 307–313 121. Zouaghi A, Merhbene L, Zrigui M (2012) Combination of information retrieval methods with LESK algorithm for Arabic word sense disambiguation. Artif Intell Rev 38(4):257–269

Bilel Elayeb He is actually Assistant Professor at the Emirates College of Technology in Abu Dhabi, UAE, since January 2013. He has been a researcher in RIADI Laboratory at the National School of Computer Science (ENSI) of Manouba University (Tunisia) since 2002. He obtained his PhD thesis in Computer Science from both the National Polytechnic Institute of Toulouse (France) and the ENSI in 2009. He obtained a Master degree in Computer Science and an Engineering Diploma both from the ENSI in 2004 and 2002, respectively. His research focuses on Mono- and Cross- Language Information Retrieval, Query Disambiguation, Query Translation, Query Expansion, Word Sense Disambiguation, Computational Linguistic, Arabic NLP, Information Reliability, including possibility theory. He has supervised many Master and PhD thesis in Artificial Intelligence, Information Retrieval and CLIR, Computational Linguistic, and NLP.

Ibrahim Bounhas He is actually Assistant Professor at the Higher Institute of Documentation (ISD) at Manouba University in Tunisia. He obtained his PhD in computer science from the Faculty of Sciences of Tunis (2012) and his Master Degree from the National School of Computer Sciences (ENSI) at Manouba University in Tunisia (2006). His current research interests are Ontology Engineering, Document Analysis, Arabic Text Processing and Retrieval, Word Sense Disambiguation, Computational Linguistic, Information Retrieval and CLIR, Information Reliability, including possibility theory. He has supervised many Master and PhD thesis in Arabic IR, CLIR, and NLP. He was member of RIADI research laboratory in 2005–2006 and then moved to LISI Laboratory since 2007.

Oussama Ben Khiroun He obtained an Engineering Diploma in Computer Science (2009) and the Master degree in Computer Science (2010) from the National School of Computer Sciences (ENSI) at Manouba University in Tunisia. He is actually Assistant Professor at the Higher Institute of Multimedia Arts (ISAMM) at Manouba University in Tunisia. He is also member of RIADI research laboratory as a PhD student. His research focuses on Mono- and Cross-Language Information Retrieval, especially on Query Disambiguation and Expansion, Word Sense Disambiguation, including possibility theory.

123

Author's personal copy B. Elayeb et al. Fabrice Evrard He has been an Assistant Professor at ENSEEIHT, Toulouse, France, since 1983. His research focuses on multiagent system, Dictionary modeling and analysis, Information retrieval, Computational Linguistic, NLP and Hierarchical Small-Worlds Neworks. He supervised a Master Degree in Artificial Intelligence at the National Polytechnic Institute of Toulouse (INPT). He conducted “Le Groupe Raisonnement, Action et Actes de Langage (GRAAL)” team, which is a part of LILaC research group at the Informatic Research Institute of Toulouse (IRIT), France. He has supervised many Master and PhD thesis in Artificial Intelligence, Information Retrieval and NLP.

Narjès Bellamine Ben Saoud She is Associate Professor at ISI (Institut Supérieur d’Informatique) at University of Tunis El-Manar (Tunisia) and researcher in RIADI Laboratory at the ENSI (Ecole Nationale des Sciences de l’Informatique) of University of Manouba, Tunisia. Dr. Bellamine Ben Saoud received her Engineering Diploma and Master Degree from ENSEEIHT Toulouse (1993), PhD from University of Toulouse 1 (1996) (France) and University Habilitation from ENSI (2007). Her main research interests include computer-supported collaborative work, complexity theory, agent-based modeling and simulation of cooperative sociotechnical systems, culture-aware collaborative environments, Information Retrieval, cloud computing. She has supervised many Master and PhD thesis in Artificial Intelligence and Information Retrieval.

123