Cranberry Expressions in English and in German

28 downloads 1919 Views 1MB Size Report
Jun 1, 2008 ... Cranberry Expressions in English and in German. Beata Trawinski. Manfred Sailer. Jan-Philipp Soehn. Lothar Lemnitzer. Frank Richter.
Cranberry Expressions in English and in German ´ Beata Trawinski Manfred Sailer Jan-Philipp Soehn Lothar Lemnitzer Frank Richter ¨ University of Tubingen & University of Gottingen ¨

Towards a Shared Task for Multiword Expressions (MWE 2008) Workshop at the LREC 2008 Conference Marrakech, Morocco June 1, 2008

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

1 / 14

Overview

Our Data Packages Cranberry Expressions The Source: CoDII Possible Applications for our Data Sets Summary

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

2 / 14

Our Data Packages

English CE Trawinski CWen.txt: 77 cranberry words in English CCen.txt: 77 corresponding cranberry expressions README.txt

German CE Trawinski CWde.txt: 444 cranberry words in German CCde.txt: 444 corresponding cranberry expressions README.txt

Available from http://multiword.sourceforge.net

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

3 / 14

Cranberry Expressions: Definition

Cranberry Word (CW): an item that only occurs within a specific expression (CE) Aronoff (1976): in analogy to Cranberry Morph Alternatively: (phraseologically) bound words, unique words, unique lexemes, hapax legomena, German: Unikalia

Cranberry Expression (CE): an expression which contains at least one Cranberry Word Moon (1998): Cranberry Collocations

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

4 / 14

Cranberry Expressions: Examples

English CEs: happy as a sandboy kith and kin to play footsie with somebody

German CEs: in Anbetracht von ‘in view of’ jemandem Kattun geben ‘to reprimand somebody’ ¨ noch und nocher ‘a lot’

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

5 / 14

Cranberry Expressions: Idioms versus Collocations CEs versus idioms (such as spill the beans): lexical fixedness but no literal meaning

CEs versus collocations (such as take a shower ): word co-occurrence of markedly high frequency but hard distributional restrictions rather than preferences

Idiom-like CEs (die Spendierhosen anhaben, wear the offering-trousers, ‘be generous’): idiomatic interpretation of the non-CW components possible syntactic variations and internal modifications non-decomposable meaning

Collocation-like CEs (happy as a sandboy): literal interpretation of the non-CW components structurally parallel to collocations sometimes the CW is interchangeable ´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

6 / 14

Cranberry Expressions: Syntactic Categories

Syntactic categories of CEs: VP: make headway PP: on tenterhooks AP: happy as a sandboy NP: the whole caboodle

Syntactic categories of CWs: V: wend one’s way A: spick and span N: run the gamut

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

7 / 14

Cranberry Expressions: Frequency Classes

FC 12: Anhieb (auf Anhieb ‘right away’) FC 21: Kattun (jemandem Kattun geben ‘to reprimand somebody’)

where an FC n indicates that the most frequent German word (der ‘the’) is 2n times more frequent than the word in question

URL: wortschatz.uni-leipzig.de

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

8 / 14

The Collection of Distributionally Idiosyncratic Items (CoDII)

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

9 / 14

CoDII: The Database

Integration into the Open Source XML database eXist, URL: exist.sourceforge.net Querying CoDII with respect to particular lemmas syntactic properties licensing environments classifications

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

10 / 14

CoDII: Some Details on the Collected CEs

German CWs: Common Nouns (80 %) > (predicative) Adjectives (7 %) > Proper Names (5 %) > Verbs (3 %)

German CEs: VPs (83 %) > PPs (20 %)

English CWs: Common Nouns (67 %) > (attributive) Adjectives (21 %) > (predicative) Adjectives (7 %) > Verbs (4 %)

English CEs: NPs (41 %) > VPs (31 %)

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

11 / 14

Possible Applications for our Data Sets Possible Applications: The exhaustive retrieval of CEs from arbitrary text possible This can be used to explore distributional differences between collocations and idioms based on the idiom-like CEs versus collocation-like CEs distinction. Other Applications: Further linguistic investigations into MWEs Development of other lexical resources (such as wordnets) Training of methods for extracting semantically related lexical items Acquisition and/or tuning of rules and methods for the automatic detection, annotation, and extraction of idioms Automatic retrieval of otherwise similar MWEs without CWs

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

12 / 14

Summary

We presented two data sets: a list of 444 CWs in German and a list of 77 CWs in English, accompanied by corresponding lists of CEs in which the CWs occur. We showed that CEs are interesting for the research into MWEs due to their middle position between idioms and collocations, a wide variety of syntactic categories they comprise, and different frequency classes they cover. We introduced CoDII, the source of our data sets. We argued that our resource may be useful for theoretical linguistic investigations into MWEs and a number of computational linguistic tasks.

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

13 / 14

References Aronoff, M. (1976). Word Formation in Generative Grammar. Cambridge, MA and London, England: MIT Press. Dobrovol’skij, D. (1988). Phraseologie als Objekt der Universallinguistik. ¨ Leipzig: Verlag Enzyklopadie. Dobrovol’skij, D. (1989). Formal gebundene phraseologische Konstituenten: Klassifikationsgrundlagen und theoretische Analyse. ¨ zur Erforschung der deutschen Sprache, Volume 9, pp. 57–78. In W. Fleischer, R. Große, and G. Lerchner (Eds.), Beitrage Leipzig: Bibliographisches Institut. Dobrovol’skij, D. and E. Piirainen (1994a). ¨ PGF: Auf dem Prasentierteller oder auf dem Abstellgleis? Zeitschrift fur ¨ Germanistik 4, 65–77. Dobrovol’skij, D. and E. Piirainen (1994b). ¨ Sprachliche Unikalia im Deutschen: Zum Phanomen phraseologisch gebundener Formative. Folia Linguistica 27 (3–4), 449–473. Moon, R. (1998). Fixed Expressions and Idioms in English: A Corpus-Based Approach. Oxford Studies in Lexicography and Lexicology. Oxford: Clarendon Press.

´ Beata Trawinski (University of Tubingen) ¨

Cranberry Expressions

MWE 2008

14 / 14