May 17, 2005 - Examples - âpurchaseâ. Noun Phrase. Nominal Modifier. Nominal Predicative. The domain over which a word specifies its syntactic constraints: ...
Localizing Dependencies and Supertagging Srinivas Bangalore – Dataoriented Parsing (Chp. 15)
Dafydd Jones 17 May 2005
Supertags ●
“Simple” tags reflect a word's syntactic category – –
●
nouns: N, verbs: V, adjectives: JJ car/N jump/V
Supertags are: – –
associated with at least one lexical item larger than Context-Free rules
VPV NP
Examples - “purchase” The domain over which a word specifies its syntactic constraints:
Noun Phrase Nominal Modifier Nominal Predicative
Properties: Lexicalization Part of lexicalized grammar, e.g. one that contains supertags, consists of: ●
●
●
a finite set of elementary structures (strings, trees, directed acyclic graphs, etc.), each structure anchored on a lexical item. lexical items, each associated with at least one of the elementary structures of the grammar. a finite set of operations combining these structures.
Properties: EDL Extended Domain of Locality ●
●
Every supertag must contain all and only the arguments of the anchor in the same structure. For each lexical item, the grammar must contain a supertag for each syntactic environment the lexical item can appear in.
Properties: FRD ●
Factoring Recursion away from Domain of Dependencies – – –
Recursive constructs are represented as auxiliary supertags Initial supertags define the domains for agreement, subcategorization Auxiliary trees, by adjunction to initial supertags, allow for the long-distance behaviour of these dependencies
Combining Supertags ●
Substitution – –
●
Adjunction – –
●
inserts elementary trees at the substitution nodes of other elementary trees. root label must match label of substitution node an auxiliary tree is inserted into an elementary tree at a node that matches both root and foot node of aux tree the node adjoined to splits
Example tree
Combining Supertags: Examples Parse tree for the sentence: “The purchase price includes two ancillary companies”
Extracting Supertags ● ●
● ●
Where do they come from? Supertags can be extracted from an annotated corpus e.g. Penn Treebank Use a head-word percolation table to build trunk Mechanism to decide between adjunct and complements
● ●
Resulting in upto 99.96% coverage of corpus
Supertag Disambiguation ●
●
● ●
●
Even when a word has unique standard POS – e.g. Verb – it will most likely have multiple Supertags After parsing, each lexical item should be associated with one supertag. Parser could do this – but expensive Supertag disambiguation before parse greatly speeds up the work of the parser Use information about local dependencies and statistical information to disambiguate
Using Structural Information ●
Span constraint: – –
●
Left/Right Span constraint: –
●
calculate the minimum number of lexical items the supertag covers if the input contains fewer items than the span, then eliminate supertag similarly, if the span to the left/right of anchor is larger than input, then eliminate
Neighbouring lexical items: –
if the terminals specified in the supertag do not appear in the input, then eliminate
Structural Filtering: Example ●
Leads to a reduction of almost 50% of supertag ambiguity Span constraint: “Includes batteries!” Left Span: “The price includes sales tax” Lexical items: Active/Passive - “..included by..”
Trigram Model ●
●
Adapted from state-of-the-art method for POS tagging – can achieve around 97% accuracy Based on sequences of n tags –
●
Unigrams – most likely tag for lexical item
n=3 usually taken - Trigrams ^
N
T =argmaxT ∏ PrT i∣T i−2 ,T i−1∗PrW i∣T i i=1
Contextual probability * Word emit probability
Trigrams: Results ●
Trained on 1,000,000 words of Wall Street Journal corpus –
●
Tested on 47,000 words –
● ●
sections 00 to 24, except section 20 of WSJ section 20 of WSJ
300 supertags (tree-frames) used Results: 92.2% accuracy
● ●
Can be improved slightly by assigning sets of supertags (average ambiguity 1.3 supertags)
Parsing from Supertags ●
Use directly encoded requirements to establish dependency links
● ● ● ●
Fill substitution nodes with complements Attach foot node to a modified supertag
Lightweight Dependency Analyzer ● ●
Heuristic-based,linear time,deterministic algorithm Algorithm: Pass 1: For each modifier supertag s Compute dependencies for s Mark complements (unavailable for pass 2) Pass 2: For each non-modifier supertag s Compute dependencies for s Compute dependencies for s (anchor w) For each frontier node d in s Connect nearest word to left/right of w depending on direction of d, such that label(d) matches an internal node, ignoring marked supertags
LDA: Results ● ●
Trained on 200,000 words Tested on 2000 sentences (from section 20 WSJ)
● ● ●
Recall – 82.3% Precision – 93.8%
● ●
●
Produces partial linkages because of need to satisfy local constraints Robust against incomplete sentences (fragments)
Applications ●
Information Retrieval – –
●
●
●
Generate patterns to use in post-filtering of IR results, to improve precision Make use of contextual syntactic information in query, obtained using supertagger
Manually select set of relevant sample sentences, and a word of interest Associate supertags with words in training sentences, and generalize context to tag name Experiment shows increase in precision from 33.3% -> 79.3%
Conclusions ●
● ● ●
By localizing dependencies, we assign richer descriptions to words Simple disambiguation leads to an “almost parse” Reduces work required by the parser Leads to robust analysis of irregular input
● ●
Fast, “shallow” method that leads to useful linguistic analysis
References ●
●
●
●
Bangalore, S. 2003. Localizing Dependencies and Supertagging,Chapter 15 in Data Oriented Parsing, eds. Rens Bod, Remko Scha and Khalil Sima'an, CSLI Publications Bangalore, S and Joshi, A. 1999. Supertagging: An approach to almost parsing. Computational Linguistics 25(2) Chandrasekar, R. and Srinivas, B. 1998. Glean: Using syntactic information in document filtering. Information Processing and Management 34(5) Supertagging without Tears http://www.cis.upenn.edu/~mickeyc/stag/supertags.html