Statistical NLP Parse Trees

Statistical NLP Spring 2010

Lecture 12: Parsing I Dan Klein – UC Berkeley

Parse Trees

The move followed a round of similar increases by other lenders, reflecting a continuing decline in that market

1

Phrase Structure Parsing Phrase structure parsing organizes syntax into constituents or brackets In general, this involves nested trees Linguists can, and do, argue about details

S VP NP

Lots of ambiguity

NP

PP

N’

Not the only kind of syntax…

NP

new art critics write reviews with computers

Constituency Tests How do we know what nodes go in the tree? Classic constituency tests: Substitution by proform Question answers Semantic gounds Coherence Reference Idioms

Dislocation Conjunction

Cross-linguistic arguments, too

2

Conflicting Tests Constituency isn’t always clear Units of transfer: think about ~ penser à talk about ~ hablar de

Phonological reduction: I will go → I’ll go I want to go → I wanna go a le centre → au centre

La vélocité des ondes sismiques

Coordination He went to and came from the store.

Classical NLP: Parsing Write symbolic or logical rules: Grammar (CFG)

Lexicon

ROOT → S

NP → NP PP

NN → interest

S → NP VP

VP → VBP NP

NNS → raises

NP → DT NN

VP → VBP NP PP

VBP → interest

NP → NN NNS

PP → IN NP

VBZ → raises …

Use deduction systems to prove parses from words Minimal grammar on “Fed raises” sentence: 36 parses Simple 10-rule grammar: 592 parses Real-size grammar: many millions of parses

This scaled very badly, didn’t yield broad-coverage tools

3

Ambiguities: PP Attachment

Attachments I cleaned the dishes from dinner I cleaned the dishes with detergent I cleaned the dishes in my pajamas I cleaned the dishes in the sink

4

Syntactic Ambiguities I Prepositional phrases: They cooked the beans in the pot on the stove with handles. Particle vs. preposition: The puppy tore up the staircase. Complement structures The tourists objected to the guide that they couldn’t hear. She knows you like the back of her hand. Gerund vs. participial adjective Visiting relatives can be boring. Changing schedules frequently confused passengers.

Syntactic Ambiguities II Modifier scope within NPs impractical design requirements plastic cup holder Multiple gap constructions The chicken is ready to eat. The contractors are rich enough to sue. Coordination scope: Small rats and mice can squeeze into holes or cracks in the wall.

5

Probabilistic Context-Free Grammars A context-free grammar is a tuple N : the set of non-terminals Phrasal categories: S, NP, VP, ADJP, etc. Parts-of-speech (pre-terminals): NN, JJ, DT, VB

T : the set of terminals (the words) S : the start symbol Often written as ROOT or TOP Not usually the sentence non-terminal S

R : the set of rules Of the form X → Y1 Y2 … Yk, with X, Yi ∈ N Examples: S → NP VP, VP → VP CC VP Also called rewrites, productions, or local trees

A PCFG adds: A top-down production probability per rule P(Y1 Y2 … Yk | X)

Treebank Sentences

6

Treebank Grammars Need a PCFG for broad coverage parsing. Can take a grammar right off the trees (doesn’t work well):

ROOT → S

1

S → NP VP .

1

NP → PRP

1

VP → VBD ADJP

1

…..

Better results by enriching the grammar (e.g., lexicalization). Can also get reasonable parsers without lexicalization.

Treebank Grammar Scale Treebank grammars can be enormous As FSAs, the raw grammar has ~10K states, excluding the lexicon Better parsers usually make the grammars larger, not smaller

NP

ADJ NOUN

DET DET

NOUN

PLURAL NOUN

PP

NP NP

NP CONJ

7

Chomsky Normal Form Chomsky normal form: All rules of the form X → Y Z or X → w In principle, this is no limitation on the space of (P)CFGs N-ary rules introduce new non-terminals VP

VP VBD NP PP PP

[VP → VBD NP PP •] [VP → VBD NP •] VBD

NP

PP

PP

Unaries / empties are “promoted”

In practice it’s kind of a pain: Reconstructing n-aries is easy Reconstructing unaries is trickier The straightforward transformations don’t preserve tree scores

Makes parsing algorithms simpler!

A Recursive Parser bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j)

Will this parser work? Why or why not? Memory requirements?

8

A Memoized Parser One small change: bestScore(X,i,j,s) if (scores[X][i][j] == null) if (j = i+1) score = tagScore(X,s[i]) else score = max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j) scores[X][i][j] = score return scores[X][i][j]

A Bottom-Up Parser (CKY) Can also organize things bottom-up bestScore(s) X for (i : [0,n-1]) for (X : tags[s[i]]) Y Z score[X][i][i+1] = tagScore(X,s[i]) for (diff : [2,n]) i k for (i : [0,n-diff]) j = i + diff for (X->YZ : rule) for (k : [i+1, j-1]) score[X][i][j] = max score[X][i][j], score(X->YZ) * score[Y][i][k] * score[Z][k][j]

j

9

Unary Rules Unary rules? bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j) max score(X->Y) * bestScore(Y,i,j)

CNF + Unary Closure We need unaries to be non-cyclic Can address by pre-calculating the unary closure Rather than having zero or more unaries, always have exactly one VP

SBAR

VP VBD VBD

SBAR

NP

NP

S NP

DT

NN

VP

VP DT

NN

Alternate unary and binary layers Reconstruct unary chains afterwards

10

Alternating Layers bestScoreB(X,i,j,s) return max max score(X->YZ) * bestScoreU(Y,i,k) * bestScoreU(Z,k,j)

bestScoreU(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->Y) * bestScoreB(Y,i,j)

Memory How much memory does this require? Have to store the score cache Cache size: |symbols|*n2 doubles For the plain treebank grammar: X ~ 20K, n = 40, double ~ 8 bytes = ~ 256MB Big, but workable.

Pruning: Beams score[X][i][j] can get too large (when?) Can keep beams (truncated maps score[i][j]) which only store the best few scores for the span [i,j]

Pruning: Coarse-to-Fine Use a smaller grammar to rule out most X[i,j] Much more on this later…

11

Time: Theory How much time will it take to parse? For each diff (

Statistical NLP Parse Trees

Statistical NLP Parse Trees

Suggest Documents

Efficiently extracting full parse trees using regular

Statistical Methods for NLP

Querying Parse Trees of Stochastic Context-Free ... - Semantic Scholar

Parse Trees of Arabic Sentences Using the Natural ... - Semantic Scholar

Querying Parse Trees of Stochastic Context-Free Grammars - CiteSeerX

Trimming CFG Parse Trees for Sentence Compression Using Machine

Molecular event extraction from Link Grammar parse trees

Statistical NLP Example: Idioms Example: Morphology Some ...

Statistical NLP 2013 - Assignment 2 - NYU

Statistical NLP Question Answering Question Answering People ...

parse tree

To Parse or Not To Parse - Lirias - KU Leuven

PARSE JOURNAL 146

Inducing and Evaluating Classification Trees with Statistical ...

Statistical physics on a product of trees

NLP

The Parse & Dispatch Approach

effective statistical models for syntactic and ... - Stanford NLP Group

NLP Research Document - NLP Wiki

Statistical NLP Question Answering Question Answering from Text ...

The Evolution of a Statistical NLP Course - Association for

A Statistical NLP Approach for Feature and Sentiment Identification

Optimizing Local Probability Models for Statistical ... - Stanford NLP

Combining NLP and Statistical Techniques for Lexical Acquisition