Data-driven WSD. Henry Anaya-Sánchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori. Word Sense Disambiguation Based on Word Sense Clustering ...
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Word Sense Disambiguation Based on Word Sense Clustering Henry Anaya-S´anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Outline
1
Introduction
2
A Knowledge-driven Framework for WSD
3
A new WSD Method
4
Experimental results
5
Conclusions
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Word Sense Disambiguation
Word Sense Disambiguation (WSD) is the general task of deciding the appropriate sense for a particular use of a polysemous word given its textual context.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Approaches to WSD
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Approaches to WSD
Word Sense Induction
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Approaches to WSD
Word Sense Induction Data-driven WSD
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Approaches to WSD
Word Sense Induction Data-driven WSD Knowledge-driven WSD
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Behaviour of most knowledge-driven WSD methods
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Behaviour of most knowledge-driven WSD methods
Match a textual context against the knowledge source,
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Behaviour of most knowledge-driven WSD methods
Match a textual context against the knowledge source, select the best match, and
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Behaviour of most knowledge-driven WSD methods
Match a textual context against the knowledge source, select the best match, and retrieve from it the suitable senses for the context constituents.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Our proposal
Introduction of a knowledge-driven framework and a first prototype algorithm for the disambiguation of nouns.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Our proposal
Introduction of a knowledge-driven framework and a first prototype algorithm for the disambiguation of nouns. Idea ... clustering of sense representations as a natural way to stand for the reflected cohesion among the words of a textual unit ...
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Clustering in the WSD area
Two main usages:
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Clustering in the WSD area
Two main usages: Clustering textual contexts to represent different senses in Data-driven WSD and Word Sense Induction . Clustering fine-grained senses into coarse-grained ones for reducing the average polysemy of words.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Knowledge-driven Framework Algorithm
Input: The finite set of nouns N and the textual context T . Output: The disambiguated noun senses. Let S be the set of all senses of nouns in N; repeat G = group(S) G 0 = filter (G , T , matching -function) S = ∪g ∈G 0 {s|s ∈ g } until stopping -criterion return S
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components
Sense representation: Topic Signatures as representation for WordNet nominal senses.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components
Sense representation: Topic Signatures as representation for WordNet nominal senses. Clustering Algorithm: Extended Star Clustering Algorithm, with cosine measure as similarity function.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components
Sense representation: Topic Signatures as representation for WordNet nominal senses. Clustering Algorithm: Extended Star Clustering Algorithm, with cosine measure as similarity function. Matching Function: P min(¯ gi , Ti ) X i P P matching -function(g , T ) = |nouns(g )|, ,− number (s) min( g¯i , Ti ) s∈g
i
(1)
i
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components (Cont.)
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components (Cont.)
Filtering Function: Orderly selects clusters to build a cover for N.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components (Cont.)
Filtering Function: Orderly selects clusters to build a cover for N. Stopping Criterion: β0 (i) =
percentile(90, sim(S))
min
{β = percentile(90 + q, sim(S))|β > β0 (i − 1)}
if i = 0, otherwise. (2)
q∈{0,5,10}
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Definition of components (Cont.)
Filtering Function: Orderly selects clusters to build a cover for N. Stopping Criterion: β0 (i) =
percentile(90, sim(S))
min
{β = percentile(90 + q, sim(S))|β > β0 (i − 1)}
if i = 0, otherwise. (2)
q∈{0,5,10}
percentile(p, sim(S)) represents the p-th percentile value of sim(S) = {cos(si , sj )|si , sj ∈ S, i 6= j} ∪ {1}
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Example
Disambiguation of all nouns in the sentence “The competition gave evidence of the athlete’s skills”.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Example
competition: #1 refers to a business relation in which two parties compete to gain customers. #2 refers to an occasion on which a winner is selected from among two or more contestants (hypernym of athletic contest, race, trial, etc.). #3 refers to the act of competing as for profit or a prize. #4 refers to the contestant you hope to defeat.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Example
evidence: #1 refers to your basis for belief or disbelief; knowledge on which to base belief. #2 refers to an indication that makes something evident. #3 refers to (law) all the means by which any alleged matter of fact whose truth is investigated at judicial trial is established or disproved.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Example
athlete: #1 referst to a person trained to compete in sports. skill #1 refers to an ability that has been acquired by training. #2 refers to an ability to produce solutions in some problem domain.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Example
Figure: Disambiguation of nouns in “The competition gave evidence of the athlete’s skills”.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Table: WSD performance in SemCor categories. Categories
A. Press: reportage C. Press: reportage L. Mystery & detective fiction F. Popular lore P. Romance & love story H. Miscellaneous M. Science fiction B. Press: editorial K. General fiction E. Skill & Hobbies G. Belles letters, biography, essays R. Humor N. Adventure & western fiction J. Learned D. Religion Brown 1 Brown 2 Whole SemCor
Polysemous nouns 0.606 0.504 0.498 0.482 0.480 0.479 0.479 0.476 0.476 0.473 0.462 0.461 0.452 0.444 0.388 0.475 0.467 0.472
All nouns 0.683 0.602 0.589 0.604 0.581 0.590 0.587 0.599 0.580 0.586 0.563 0.576 0.552 0.571 0.494 0.588 0.576 0.582
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Table: Results using different topic signatures. Signature Based only on WordNet Web-based
Recall 0.501 0.433
Polysemous nouns Precision F Coverage 0.501 0.501 100 % 0.461 0.447 93.8 %
Recall 0.603 0.536
All nouns Precision F 0.603 0.603 0.565 0.550
Signature Based only on WordNet Web-based
Coverage 100 % 94.9 %
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Table: Overall performance. WSD method Conceptual density Lesk UNED method Specification marks
Recall 0.220 0.274 0.313 0.391
Full coverage not not not yes
Our method
0.472
yes
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Conclusions and Future Work
1. A general framework for the disambiguation of nouns and a first prototype have been introduced. 2. Its novelty consists of the use of clustering. 3. Different disambiguation algorithms can be obtained from the framework. 4. Results are encouraging.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Conclusions and Future Work
1. A general framework for the disambiguation of nouns and a first prototype have been introduced. 2. Its novelty consists of the use of clustering. 3. Different disambiguation algorithms can be obtained from the framework. 4. Results are encouraging. Further work: To instance & extend.
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering
Outline Introduction A Knowledge-driven Framework for WSD A new WSD Method Experimental results Conclusions
Thanks!
Henry Anaya-S´ anchez, Aurora Pons-Porrata and Rafael Berlanga-Llavori Word Sense Disambiguation Based on Word Sense Clustering