Chapter 11 Tree-based models - Statistical Machine Translation

22 downloads 62 Views 2MB Size Report
Translation models based on tree representation of language. – significant ongoing ... Phrase structure grammar tree for an English sentence. (as produced  ...
Chapter 11 Tree-based models

Statistical Machine Translation

Tree-Based Models • Traditional statistical models operate on sequences of words • Many translation problems can be best explained by pointing to syntax – reordering, e.g., verb movement in German–English translation – long distance agreement (e.g., subject-verb) in output ⇒ Translation models based on tree representation of language – significant ongoing research – state-of-the art for some language pairs

Chapter 11: Tree-Based Models

1

Phrase Structure Grammar • Phrase structure – – – –

noun phrases: the big man, a house, ... prepositional phrases: at 5 o’clock, in Edinburgh, ... verb phrases: going out of business, eat chicken, ... adjective phrases, ...

• Context-free Grammars (CFG) – non-terminal symbols: phrase structure labels, part-of-speech tags – terminal symbols: words – production rules: nt → [nt,t]+ example: np → det nn

Chapter 11: Tree-Based Models

2

Phrase Structure Grammar S VP-A VP-A VP-A PP PRP

I

MD

VB

VBG

RP

TO

PRP

NP-A DT

NNS

shall be passing on to you some comments Phrase structure grammar tree for an English sentence (as produced Collins’ parser)

Chapter 11: Tree-Based Models

3

Synchronous Phrase Structure Grammar • English rule np → det jj nn • French rule np → det nn jj • Synchronous rule (indices indicate alignment): np → det1 nn2 jj3 | det1 jj3 nn2

Chapter 11: Tree-Based Models

4

Synchronous Grammar Rules • Nonterminal rules np → det1 nn2 jj3 | det1 jj3 nn2 • Terminal rules n → maison | house np → la maison bleue | the blue house • Mixed rules np → la maison jj1 | the jj1 house

Chapter 11: Tree-Based Models

5

Tree-Based Translation Model • Translation by parsing – synchronous grammar has to parse entire input sentence – output tree is generated at the same time – process is broken up into a number of rule applications • Translation probability score(tree, e, f) =

Y

rulei

i

• Many ways to assign probabilities to rules

Chapter 11: Tree-Based Models

6

Aligned Tree Pair S VP-A VP-A VP-A PP PRP

I

Ich

MD

VB

VBG

RP

TO

PRP

NP-A DT

NNS

shall be passing on to you some comments

werde Ihnen die entsprechenden Anmerkungen aushändigen

PPER

VAFIN

PPER

ART

ADJ

NN

VVFIN

NP S

VP

VP

Phrase structure grammar trees with word alignment (German–English sentence pair.) Chapter 11: Tree-Based Models

7

Reordering Rule • Subtree alignment vp

pper ...



np

vvfin

...

aush¨ andigen

vp vbg

rp

pp

np

passing

on

...

...

• Synchronous grammar rule vp → pper1 np2 aush¨ andigen | passing on pp1 np2 • Note: – one word aush¨ andigen mapped to two words passing on ok – but: fully non-terminal rule not possible (one-to-one mapping constraint for nonterminals) Chapter 11: Tree-Based Models

8

Another Rule • Subtree alignment pro



pp

Ihnen

to

prp

to

you

• Synchronous grammar rule (stripping out English internal structure) pro/pp → Ihnen | to you • Rule with internal structure pro/pp → Ihnen

Chapter 11: Tree-Based Models

to

prp

to

you 9

Another Rule • Translation of German werde to English shall be



vp

vp

vafin

vp

md

werde

...

shall

vp vb

vp

be

...

• Translation rule needs to include mapping of vp ⇒ Complex rule vafin vp1 vp →

werde

md shall

vp vb

vp1

be Chapter 11: Tree-Based Models

10

Internal Structure • Stripping out internal structure vp → werde vp1 | shall be vp1 ⇒ synchronous context free grammar • Maintaining internal structure vafin vp1 vp →

werde

md shall

vp vb

vp1

be ⇒ synchronous tree substitution grammar Chapter 11: Tree-Based Models

11

Learning Synchronous Grammars • Extracting rules from a word-aligned parallel corpus • First: Hierarchical phrase-based model – only one non-terminal symbol x – no linguistic syntax, just a formally syntactic model • Then: Synchronous phrase structure model – non-terminals for words and phrases: np, vp, pp, adj, ... – corpus must also be parsed with syntactic parser

Chapter 11: Tree-Based Models

12

I shall be

Anmerkungen aushändigen

Ihnen die entsprechenden

Ich werde

Extracting Phrase Translation Rules

shall be = werde

passing on to you some comments

Chapter 11: Tree-Based Models

13

Anmerkungen aushändigen

entsprechenden

Ich werde Ihnen die

Extracting Phrase Translation Rules

I shall be passing on to you some comments

Chapter 11: Tree-Based Models

some comments = die entsprechenden Anmerkungen

14

aushändigen

Anmerkungen

die entsprechenden

Ich werde Ihnen

Extracting Phrase Translation Rules

I shall be passing on to you some comments

Chapter 11: Tree-Based Models

werde Ihnen die entsprechenden Anmerkungen aushändigen = shall be passing on to you some comments

15

Anmerkungen aushändigen

entsprechenden

Ich werde Ihnen die

Extracting Hierarchical Phrase Translation Rules

subtracting subphrase

I shall be passing on

werde X aushändigen = shall be passing on X

to you some comments

Chapter 11: Tree-Based Models

16

Formal Definition • Recall: consistent phrase pairs (¯ e, f¯) consistent with A ⇔ ∀ei ∈ e¯ : (ei, fj ) ∈ A → fj ∈ f¯ and ∀fj ∈ f¯ : (ei, fj ) ∈ A → ei ∈ e¯ and ∃ei ∈ e¯, fj ∈ f¯ : (ei, fj ) ∈ A

• Let P be the set of all extracted phrase pairs (¯ e, f¯)

Chapter 11: Tree-Based Models

17

Formal Definition • Extend recursively: if (¯ e, f¯) ∈ P and (¯ esub, f¯sub) ∈ P and e¯ = e¯pre + e¯sub + e¯post and f¯ = f¯pre + f¯sub + f¯post and e¯ 6= e¯sub and f¯ 6= f¯sub add (epre + x + epost, fpre + x + fpost) to P (note: any of epre, epost, fpre, or fpost may be empty) • Set of hierarchical phrase pairs is the closure under this extension mechanism

Chapter 11: Tree-Based Models

18

Comments • Removal of multiple sub-phrases leads to rules with multiple non-terminals, such as: y → x1 x2 | x2 of x1 • Typical restrictions to limit complexity [Chiang, 2005] – at most 2 nonterminal symbols – at least 1 but at most 5 words per language – span at most 15 words (counting gaps)

Chapter 11: Tree-Based Models

19

Learning Syntactic Translation Rules S VP

PRP S

Anm. NN aushänd.

Ich PPER werde VAFIN Ihnen PPER die ART entspr. ADJ

NP

VVFIN

VP

I

shall VB be VBG passing RP on TO to PP PRP you DT some NNS comments MD

VP VP

VP

NP

Chapter 11: Tree-Based Models

pro Ihnen

=

pp to

prp

to

you

20

Constraints on Syntactic Rules • Same word alignment constraints as hierarchical models • Hierarchical: rule can cover any span ⇔ syntactic rules must cover constituents in the tree • Hierarchical: gaps may cover any span ⇔ gaps must cover constituents in the tree

• Much less rules are extracted (all things being equal)

Chapter 11: Tree-Based Models

21

Impossible Rules S VP

PRP S VP VP

VP

NP

I

shall VB be VBG passing RP on TO to PP PRP you DT some NNS comments MD

Chapter 11: Tree-Based Models

Anm. NN aushänd.

Ich PPER werde VAFIN Ihnen PPER die ART entspr. ADJ

NP

VVFIN

VP

English span not a constituent no rule extracted

22

Rules with Context S VP

PRP S

Anm. NN aushänd.

Ich PPER werde VAFIN Ihnen PPER die ART entspr. ADJ

NP

VVFIN

VP

Rule with this phrase pair requires syntactic context

I

shall VB be VBG passing RP on TO to PP PRP you DT some NNS comments

vp

MD

VP VP

VP

NP

Chapter 11: Tree-Based Models

vp

vafin

vp

werde

...

=

md shall

vp vb

vp

be

...

23

Too Many Rules Extractable

• Huge number of rules can be extracted (every alignable node may or may not be part of a rule → exponential number of rules)

• Need to limit which rules to extract • Option 1: similar restriction as for hierarchical model (maximum span size, maximum number of terminals and non-terminals, etc.)

• Option 2: only extract minimal rules (”GHKM” rules)

Chapter 11: Tree-Based Models

24

Minimal Rules S VP VP VP PP

Ich

PRP

MD

I

shall

werde

VB

RP

TO

PRP

be passing on

to

you

Ihnen

VBG

NP

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extract: set of smallest rules required to explain the sentence pair Chapter 11: Tree-Based Models

25

Lexical Rule S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: prp → Ich | I Chapter 11: Tree-Based Models

26

Lexical Rule S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: prp → Ihnen | you Chapter 11: Tree-Based Models

27

Lexical Rule S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: dt → die | some Chapter 11: Tree-Based Models

28

Lexical Rule S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: nns → Anmerkungen | comments Chapter 11: Tree-Based Models

29

Insertion Rule S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: pp → x | to prp Chapter 11: Tree-Based Models

30

Non-Lexical Rule S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: np → x1 x2 | dt1 nns2 Chapter 11: Tree-Based Models

31

Lexical Rule with Syntactic Context S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: vp → x1 x2 aush¨ andigen | passing on pp1 np2 Chapter 11: Tree-Based Models

32

Lexical Rule with Syntactic Context S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: vp → werde x | shall be vp (ignoring internal structure) Chapter 11: Tree-Based Models

33

Non-Lexical Rule S VP VP VP PP

Ich

PRP

MD

I

shall

werde

NP

RP

TO

PRP

be passing on

to

you

VBG

VB

Ihnen

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Extracted rule: s → x1 x2 | prp1 vp2 done — note: one rule per alignable constituent Chapter 11: Tree-Based Models

34

Unaligned Source Words S VP VP VP PP

Ich

PRP

MD

I

shall

werde

VB

RP

TO

PRP

be passing on

to

you

Ihnen

VBG

NP

die

entsprechenden

DT

NNS

some comments

Anmerkungen aushändigen

Attach to neighboring words or higher nodes → additional rules Chapter 11: Tree-Based Models

35

Too Few Phrasal Rules?

• Lexical rules will be 1-to-1 mappings (unless word alignment requires otherwise) • But: phrasal rules very beneficial in phrase-based models • Solutions – combine rules that contain a maximum number of symbols (as in hierarchical models, recall: ”Option 1”) – compose minimal rules to cover a maximum number of non-leaf nodes

Chapter 11: Tree-Based Models

36

Composed Rules • Current rules

x1 x2 = dt1

die = dt

np nns1

entsprechenden Anmerkungen = nns

some

comments

• Composed rule die entsprechenden Anmerkungen = np dt

nns

some

comments

(1 non-leaf node: np) Chapter 11: Tree-Based Models

37

Composed Rules • Minimal rule:

3 non-leaf nodes: vp, pp, np

• Composed rule:

3 non-leaf nodes: vp, pp and np

Chapter 11: Tree-Based Models

x1 x2 aush¨ andigen =

vp

prp

prp

passing

on

Ihnen x1 aush¨ andigen = prp

prp

passing

on

pp1

np2

vp pp to

prp

to

you

np1

38

Relaxing Tree Constraints • Impossible rule x

=

werde

md

vb

shall

be

• Create new non-terminal label: md+vb ⇒ New rule x werde

Chapter 11: Tree-Based Models

=

md+vb md

vb

shall

be

39

Zollmann Venugopal Relaxation • If span consists of two constituents , join them: x+y • If span conststs of three constituents, join them: x+y+z • If span covers constituents with the same parent x and include – every but the first child y, label as x\y – every but the last child y, label as x/y • For all other cases, label as fail

⇒ More rules can be extracted, but number of non-terminals blows up Chapter 11: Tree-Based Models

40

Special Problem: Flat Structures

• Flat structures severely limit rule extraction np

dt

nnp

nnp

nnp

nnp

the

Israeli

Prime

Minister

Sharon

• Can only extract rules for individual words or entire phrase

Chapter 11: Tree-Based Models

41

Relaxation by Tree Binarization np dt

np

the nnp

np

Israeli nnp Prime

np nnp

nnp

Minister

Sharon

More rules can be extracted Left-binarization or right-binarization? Chapter 11: Tree-Based Models

42

Scoring Translation Rules • Extract all rules from corpus • Score based on counts – – – – –

joint rule probability: p(lhs, rhsf , rhse) rule application probability: p(rhsf , rhse|lhs) direct translation probability: p(rhse|rhsf , lhs) noisy channel translation probability: p(rhsf |rhse, lhs) Q lexical translation probability: ei∈rhse p(ei|rhsf , a)

Chapter 11: Tree-Based Models

43

Syntactic Decoding Inspired by monolingual syntactic chart parsing: During decoding of the source sentence, a chart with translations for the O(n2) spans has to be filled

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

Chapter 11: Tree-Based Models

VP

44



Syntax Decoding

VB

drink Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

VP

German input sentence with tree Chapter 11: Tree-Based Models

45



Syntax Decoding

➊ PRO

VB

she

drink

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

VP

Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models

46



Syntax Decoding



➋ PRO

NN

VB

she

coffee

drink

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

VP

Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models

47



Syntax Decoding







PRO

NN

VB

she

coffee

drink

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

VP

Purely lexical rule: filling a span with a translation (a constituent in the chart) Chapter 11: Tree-Based Models

48



Syntax Decoding

➍ NP NP

PP

DET |

NN |

IN |

a

cup

of



NN





PRO

NN

VB

she

coffee

drink

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

VP

Complex rule: matching underlying constituent spans, and covering words Chapter 11: Tree-Based Models

49





Syntax Decoding VP VP VBZ |

TO |

wants

NP

VB

to ➍ NP NP

PP

DET |

NN |

IN |

a

cup

of



NN





PRO

NN

VB

she

coffee

drink

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

VP

Complex rule with reordering Chapter 11: Tree-Based Models

50

➏ S

Syntax Decoding

PRO

VP

➎ VP VP VBZ |

TO |

wants

NP

VB

to ➍ NP NP

PP

DET |

NN |

IN |

a

cup

of



NN





PRO

NN

VB

she

coffee

drink

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

Chapter 11: Tree-Based Models

VP

51

Bottom-Up Decoding • For each span, a stack of (partial) translations is maintained • Bottom-up: a higher stack is filled, once underlying stacks are complete

Chapter 11: Tree-Based Models

52

Naive Algorithm Input: Foreign sentence f = f1, ...flf , with syntax tree Output: English translation e 1: for all spans [start,end] (bottom up) do 2: for all sequences s of hypotheses and words in span [start,end] do 3: for all rules r do 4: if rule r applies to chart sequence s then 5: create new hypothesis c 6: add hypothesis c to chart 7: end if 8: end for 9: end for 10: end for 11: return English translation e from best hypothesis in span [0,lf ]

Chapter 11: Tree-Based Models

53

Chart Organization

Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

NP S

VP

• Chart consists of cells that cover contiguous spans over the input sentence • Each cell contains a set of hypotheses1 • Hypothesis = translation of span with target-side constituent 1

In the book, they are called chart entries.

Chapter 11: Tree-Based Models

54

Dynamic Programming Applying rule creates new hypothesis

NP: a cup of coffee

apply rule: NP → NP Kaffee ; NP → NP+P coffee

NP+P: a cup of

NP: coffee

eine

Tasse

Kaffee

trinken

ART

NN

NN

VVINF

Chapter 11: Tree-Based Models

55

Dynamic Programming Another hypothesis NP: a cup of coffee NP: a cup of coffee

apply rule: NP → eine Tasse NP ; NP → a cup of NP NP+P: a cup of NP: coffee

eine

Tasse

Kaffee

trinken

ART

NN

NN

VVINF

Both hypotheses are indistiguishable in future search → can be recombined Chapter 11: Tree-Based Models

56

Recombinable States Recombinable?

NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee

Chapter 11: Tree-Based Models

57

Recombinable States Recombinable?

NP: a cup of coffee NP: a cup of coffee NP: a mug of coffee

Yes, iff max. 2-gram language model is used

Chapter 11: Tree-Based Models

58

Recombinability Hypotheses have to match in • span of input words covered • output constituent label • first n–1 output words not properly scored, since they lack context • last n–1 output words still affect scoring of subsequently added words, just like in phrase-based decoding

(n is the order of the n-gram language model)

Chapter 11: Tree-Based Models

59

Language Model Contexts When merging hypotheses, internal language model contexts are absorbed

S (minister of Germany met with Condoleezza Rice)

the foreign ... ... in Frankfurt

NP

VP

(minister)

(Condoleezza Rice)

the foreign ... ... of Germany

relevant history

met with ...

... in Frankfurt

un-scored words

pLM(met | of Germany) pLM(with | Germany met)

Chapter 11: Tree-Based Models

60

Stack Pruning • Number of hypotheses in each chart cell explodes ⇒ need to discard bad hypotheses e.g., keep 100 best only • Different stacks for different output constituent labels? • Cost estimates – translation model cost known – language model cost for internal words known → estimates for initial words – outside cost estimate? (how useful will be a NP covering input words 3–5 later on?)

Chapter 11: Tree-Based Models

61

Naive Algorithm: Blow-ups • Many subspan sequences for all sequences s of hypotheses and words in span [start,end] • Many rules for all rules r • Checking if a rule applies not trivial rule r applies to chart sequence s ⇒ Unworkable

Chapter 11: Tree-Based Models

62

Solution

• Prefix tree data structure for rules • Dotted rules • Cube pruning

Chapter 11: Tree-Based Models

63

Storing Rules • First concern: do they apply to span? → have to match available hypotheses and input words • Example rule np → x1 des x2 | np1 of the nn2 • Check for applicability – is there an initial sub-span that with a hypothesis with constituent label np? – is it followed by a sub-span over the word des? – is it followed by a final sub-span with a hypothesis with label nn? • Sequence of relevant information np • des • nn • np1 of the nn2 Chapter 11: Tree-Based Models

64

Rule Applicability Check

Trying to cover a span of six words with given rule

NP • des • NN → NP: NP of the NN

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry

65

Rule Applicability Check

First: check for hypotheses with output constituent label np

NP • des • NN → NP: NP of the NN

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry

66

Rule Applicability Check

Found np hypothesis in cell, matched first symbol of rule

NP • des • NN → NP: NP of the NN

NP

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry

67

Rule Applicability Check

Matched word des, matched second symbol of rule

NP • des • NN → NP: NP of the NN

NP

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry

68

Rule Applicability Check

Found a nn hypothesis in cell, matched last symbol of rule

NP • des • NN → NP: NP of the NN

NP

das

NN

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry

69

Rule Applicability Check Matched entire rule → apply to create a np hypothesis

NP • des • NN → NP: NP of the NN NP

NP

das

NN

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry

70

Rule Applicability Check Look up output words to create new hypothesis (note: there may be many matching underlying np and nn hypotheses)

NP • des • NN → NP: NP of the NN NP: the house of the architect Frank Gehry

NP: the house

das

Haus

Chapter 11: Tree-Based Models

NN: architect Frank Gehry

des

Architekten

Frank

Gehry

71

Checking Rules vs. Finding Rules

• What we showed: – given a rule – check if and how it can be applied • But there are too many rules (millions) to check them all • Instead: – given the underlying chart cells and input words – find which rules apply

Chapter 11: Tree-Based Models

72

Prefix Tree for Rules NP

NP: NP1 IN2 NP3 NP: NP1 of DET2 NP3 NP: NP1 of IN2 NP3

...

des um

NN VP …

...

...

NP: NP1 of the NN2 NP: NP2 NP1 NP: NP1 of NP2

...

VP …

...

PP …

NN

...

DET NP … NP: NP1

das

Haus

NP: DET1 NN2

...

NN

...

DET

NP: the house

...

...

...

Highlighted Rules np → np1 det2 nn3 | np1 in2 nn3 np → np1 | np1 np → np1 des nn2 | np1 of the nn2 np → np1 des nn2 | np2 np1 np → det1 nn2 | det1 nn2 np → das Haus | the house Chapter 11: Tree-Based Models

73

Dotted Rules: Key Insight • If we can apply a rule like p→ABC | x to a span • Then we could have applied a rule like q→AB | y to a sub-span with the same starting word ⇒ We can re-use rule lookup by storing A B • (dotted rule)

Chapter 11: Tree-Based Models

74

Finding Applicable Rules in Prefix Tree

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 75

Covering the First Cell

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 76

Looking up Rules in the Prefix Tree ●

das

Haus

Chapter 11: Tree-Based Models

des

das ❶

Architekten

Frank

Gehry 77

Taking Note of the Dotted Rule ●

das ❶

das ❶

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 78

Checking if Dotted Rule has Translations das ❶



DET:

DET:

the that

das ❶

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 79

Applying the Translation Rules das ❶



DET:

DET:

the that

that DET: the DET:

das ❶

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 80

Looking up Constituent Label in Prefix Tree ●

das ❶ DET



that DET: the DET:

das ❶

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 81

Add to Span’s List of Dotted Rules ●

das ❶ DET



that DET: the DET:

DET ❷

das ❶

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 82

Moving on to the Next Cell ●

das ❶ DET



that DET: the DET:

DET ❷

das ❶

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 83

Looking up Rules in the Prefix Tree ●

das ❶ DET



Haus ❸

that DET: the DET:

DET ❷

das ❶

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 84

Taking Note of the Dotted Rule ●

das ❶ DET



Haus ❸

that DET: the DET:

DET ❷

das ❶

das

house ❸

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 85

Checking if Dotted Rule has Translations das ❶



DET



Haus ❸ NN:

house NP: house

that DET: the DET:

DET ❷

das ❶

das

house ❸

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 86

Applying the Translation Rules das ❶



DET



Haus ❸ NN:

house NP: house

that DET: the DET:

DET ❷

das ❶

das

house NN: house NP:

house ❸

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 87

Looking up Constituent Label in Prefix Tree ●

das ❶ DET



Haus ❸ NN ❹ NP ❺

that DET: the DET:

DET ❷

das ❶

das

house NN: house NP:

house ❸

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 88

Add to Span’s List of Dotted Rules ●

das ❶ DET



Haus ❸ NN ❹ NP ❺

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry 89

More of the Same ●

das ❶ DET



Haus ❸ NN ❹ NP ❺

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 90

Moving on to the Next Cell ●

das ❶ DET



Haus ❸ NN ❹ NP ❺

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 91

Covering a Longer Span Cannot consume multiple words at once All rules are extensions of existing dotted rules Here: only extensions of span over das possible

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 92

Extensions of Span over das ●

das ❶

NN, NP, Haus?



NN, NP, Haus?

DET

Haus ❸ NN ❹ NP ❺

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 93

Looking up Rules in the Prefix Tree das ❶



Haus ❻ NN

DET

Haus ❽



NN

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:



architect NN: architect



NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 94

Taking Note of the Dotted Rule das ❶



Haus ❻ NN

DET



Haus ❽



NN



DET NN❾ DET Haus❽ das NN❼

das Haus❻

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 95

Checking if Dotted Rules have Translations ●

das ❶

Haus ❻ NP: the house NN

DET





NP:

the NN

Haus ❽ NP: DET house NN



NP: DET NN

DET NN❾ DET Haus❽ das NN❼

das Haus❻

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 96

Applying the Translation Rules ●

das ❶

Haus ❻ NP: the house NN

DET





NP:

the NN

Haus ❽ NP: DET house NN



NP: DET NN

that house NP: the house NP:

DET NN❾ DET Haus❽ das NN❼

das Haus❻

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 97

Looking up Constituent Label in Prefix Tree ●

das ❶

Haus ❻ NP: the house NN

DET





NP:

the NN

Haus ❽ NP: DET house NN



NP: DET NN

NP ❺ that house NP: the house NP:

DET NN❾ DET Haus❽ das NN❼

das Haus❻

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 98

Add to Span’s List of Dotted Rules ●

das ❶

Haus ❻ NP: the house NN

DET





NP:

the NN

Haus ❽ NP: DET house NN



NP: DET NN

NP ❺ that house NP: the house NP:

DET NN❾ DET Haus❽ das NN❼

NP❺

das Haus❻

that DET: the DET:

house NN: house NP:

DET ❷

NN ❹ NP ❺

das ❶

house ❸

das

Haus

Chapter 11: Tree-Based Models

of DET: the IN:

architect NN: architect NP:

NNP:

Frank

NNP:

Gehry

DET ❷

NN ❹

NNP•

NNP•

des•

Architekten•

Frank•

Gehry•

des

Architekten

Frank

Gehry 99

Even Larger Spans Extend lists of dotted rules with cell constituent labels span’s dotted rule list (with same start) plus neighboring span’s constituent labels of hypotheses (with same end)

das

Haus

Chapter 11: Tree-Based Models

des

Architekten

Frank

Gehry

100

Reflections

• Complexity O(rn3) with sentence length n and size of dotted rule list r – may introduce maximum size for spans that do not start at beginning – may limit size of dotted rule list (very arbitrary) • Does the list of dotted rules explode? • Yes, if there are many rules with neighboring target-side non-terminals – such rules apply in many places – rules with words are much more restricted

Chapter 11: Tree-Based Models

101

Difficult Rules • Some rules may apply in too many ways • Neighboring input non-terminals vp → gibt x1 x2 | gives np2 to np1 – non-terminals may match many different pairs of spans – especially a problem for hierarchical models (no constituent label restrictions) – may be okay for syntax-models • Three neighboring input non-terminals vp → trifft x1 x2 x3 heute | meets np1 today pp2 pp3 – will get out of hand even for syntax models Chapter 11: Tree-Based Models

102

Where are we now?

• We know which rules apply • We know where they apply (each non-terminal tied to a span) • But there are still many choices – many possible translations – each non-terminal may match multiple hypotheses → number choices exponential with number of non-terminals

Chapter 11: Tree-Based Models

103

Rules with One Non-Terminal Found applicable rules pp → des x | ... np ... PP ➝ of NP PP ➝ by NP PP ➝ in NP PP ➝ on

to NP

the architect ... architect Frank ... the famous ... Frank Gehry

NP NP NP NP

• Non-terminal will be filled any of h underlying matching hypotheses • Choice of t lexical translations ⇒ Complexity O(ht) (note: we may not group rules by target constituent label, so a rule np → des x | the np would also be considered here as well)

Chapter 11: Tree-Based Models

104

Rules with Two Non-Terminals Found applicable rule np → x1 des x2 | np1 ... np2 a house a building the building a new house

NP ➝ NP of NP NP ➝ NP by NP NP ➝ NP in NP NP ➝ NP on

to NP

the architect architect Frank ... the famous ... Frank Gehry

NP NP NP NP

• Two non-terminal will be filled any of h underlying matching hypotheses each • Choice of t lexical translations ⇒ Complexity O(h2t) — a three-dimensional ”cube” of choices (note: rules may also reorder differently)

Chapter 11: Tree-Based Models

105

1.5 in

the ... 1.7 by architect ... 2.6 by the ... 3.2 of the ...

Cube Pruning

a house 1.0 a building 1.3 the building 2.2 a new house 2.6

Arrange all the choices in a ”cube” (here: a square, generally a orthotope, also called a hyperrectangle)

Chapter 11: Tree-Based Models

106

1.5 in

the ... 1.7 by architect ... 2.6 by the ... 3.2 of the ...

Create the First Hypothesis

a house 1.0 a building 1.3 the building 2.2 a new house 2.6

2.1

• Hypotheses created in cube: (0,0)

Chapter 11: Tree-Based Models

107

1.5 in

the ... 1.7 by architect ... 2.6 by the ... 3.2 of the ...

Add (”Pop”) Hypothesis to Chart Cell

a house 1.0 a building 1.3 the building 2.2 a new house 2.6

2.1

• Hypotheses created in cube:  • Hypotheses in chart cell stack: (0,0)

Chapter 11: Tree-Based Models

108

1.5 in

the ... 1.7 by architect ... 2.6 by the ... 3.2 of the ...

Create Neighboring Hypotheses

a house 1.0 a building 1.3 the building 2.2 a new house 2.6

2.1 2.5 2.7

• Hypotheses created in cube: (0,1), (1,0) • Hypotheses in chart cell stack: (0,0)

Chapter 11: Tree-Based Models

109

1.5 in

the ... 1.7 by architect ... 2.6 by the ... 3.2 of the ...

Pop Best Hypothesis to Chart Cell

a house 1.0 a building 1.3 the building 2.2 a new house 2.6

2.1 2.5 2.7

• Hypotheses created in cube: (0,1) • Hypotheses in chart cell stack: (0,0), (1,0)

Chapter 11: Tree-Based Models

110

1.5 in

the ... 1.7 by architect ... 2.6 by the ... 3.2 of the ...

Create Neighboring Hypotheses

a house 1.0 a building 1.3 the building 2.2 a new house 2.6

2.1 2.5 3.1 2.7 2.4

• Hypotheses created in cube: (0,1), (1,1), (2,0) • Hypotheses in chart cell stack: (0,0), (1,0)

Chapter 11: Tree-Based Models

111

1.5 in

the ... 1.7 by architect ... 2.6 by the ... 3.2 of the ...

More of the Same

a house 1.0 a building 1.3 the building 2.2 a new house 2.6

2.1 2.5 3.1 2.7 2.4 3.0 3.8

• Hypotheses created in cube: (0,1), (1,2), (2,1), (2,0) • Hypotheses in chart cell stack: (0,0), (1,0), (1,1)

Chapter 11: Tree-Based Models

112

Queue of Cubes • Several groups of rules will apply to a given span • Each of them will have a cube • We can create a queue of cubes ⇒ Always pop off the most promising hypothesis, regardless of cube

• May have separate queues for different target constituent labels

Chapter 11: Tree-Based Models

113

Bottom-Up Chart Decoding Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:

for all spans (bottom up) do extend dotted rules for all dotted rules do find group of applicable rules create a cube for it create first hypothesis in cube place cube in queue end for for specified number of pops do pop off best hypothesis of any cube in queue add it to the chart cell create its neighbors end for extend dotted rules over constituent labels end for

Chapter 11: Tree-Based Models

114

Two-Stage Decoding • First stage: decoding without a language model (-LM decoding) – may be done exhaustively – eliminate dead ends – optionably prune out low scoring hypotheses • Second stage: add language model – limited to packed chart obtained in first stage • Note: essentially, we do two-stage decoding for each span at a time

Chapter 11: Tree-Based Models

115

Coarse-to-Fine

• Decode with increasingly complex model

• Examples – reduced language model [Zhang and Gildea, 2008] – reduced set of non-terminals [DeNero et al., 2009] – language model on clustered word classes [Petrov et al., 2008]

Chapter 11: Tree-Based Models

116

Outside Cost Estimation • Which spans should be more emphasized in search? • Initial decoding stage can provide outside cost estimates

NP Sie

will

eine

Tasse

Kaffee

trinken

PPER

VAFIN

ART

NN

NN

VVINF

• Use min/max language model costs to obtain admissible heuristic (or at least something that will guide search better)

Chapter 11: Tree-Based Models

117

Open Questions

• Where does the best translation fall out the beam? • How accurate are LM estimates? • Are particular types of rules too quickly discarded? • Are there systemic problems with cube pruning?

Chapter 11: Tree-Based Models

118

Summary • Synchronous context free grammars • Extracting rules from a syntactically parsed parallel corpus • Bottom-up decoding • Chart organization: dynamic programming, stacks, pruning • Prefix tree for rules • Dotted rules • Cube pruning

Chapter 11: Tree-Based Models

119

Suggest Documents