Improvements in DP beam search for phrase-based ...

IWSLT – Honolulu, HW – October 2008

Improvements in DP Beam Search for Phrase-based SMT Richard Zens and Hermann Ney Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6 Computer Science Department RWTH Aachen University, Germany Zens: Improvements in DP Beam Search for Phrase-based SMT

1

IWSLT Oct, 2008

Overview

1. Introduction & related work 2. Search for phrase-based MT 3. Experimental results 4. Summary & conclusions

Zens: Improvements in DP Beam Search for Phrase-based SMT

2

IWSLT Oct, 2008

Contributions

• clear & precise description of phrase-based search • analysis of important aspects – rest score estimation – lexical vs. coverage hypotheses – beam search including cube pruning on a large data task


3

IWSLT Oct, 2008

Related Work

• based on – [Zens & Och+ 02]: phrase-based model – [Och 02]: rest score estimation (for AT) – [Tillmann & Ney 03]: search for SWB models • other related work: – Pharaoh [Koehn 03], Moses [Koehn & Hoang+ 07] – many others, e.g. [Tillmann 06], [Moore & Quirk 07], ...


4

IWSLT Oct, 2008

System Architecture Source Language Text

Models

Preprocessing

Language Models

F

Phrase Models

Global Search ˆ = argmax{p(E|F )} E

Word Models

E

= argmax{ λmhm(E, F )} X

m

Reordering Models ...

E

ˆ E

...

Postprocessing

Target Language Text Zens: Improvements in DP Beam Search for Phrase-based SMT

5

IWSLT Oct, 2008

Search for Phrase-based SMT

4

interdependencies: • find phrase boundaries

3

• reordering in target language • find most ‘plausible’ sentence

2

constraints: • no gaps

1

• no overlaps

0


6

IWSLT Oct, 2008

Search

( • goal: argmax E

max S

M P

) λmhm(E, S; F )

m=1

with target sentence E, segmentation S, source sentence F , models h(·), weights λ

• models: – within phrase models: phrase lexica, word lexica, word penalty, phrase penalty – n-gram backing-off language model – distortion penalty


7

IWSLT Oct, 2008

Search Space • source sentence F = f1, ..., fJ • states (C, e˜, j) – coverage C ⊆ {1, ..., J }: translated input positions – LM history e˜ to predict the next target word – source position j for the distortion model • edges (˜ e, j, j 0) – generate target phrase e˜ – which covers the source sentence words fj , ..., fj 0 • expanding (C, e˜, j) with (˜ e0, j 00, j 0) results in state (C ∪ {j 00, . . . , j 0}, e˜ ⊕ e˜0, j 0) Zens: Improvements in DP Beam Search for Phrase-based SMT

8

IWSLT Oct, 2008

Lexical vs. Coverage Hypotheses

• (partial) hypothesis: path to state (C, e˜, j) • for each cardinality c = |C|: we have a list of coverage hypotheses C • for each coverage C: we have a list of lexical hypotheses (˜ e, j) • beam search: limit the list sizes


9

IWSLT Oct, 2008

Search Illustration Legend: Coverage Hypothesis Lexical Hypothesis

c−2

c−1


c 10

c+1 IWSLT Oct, 2008

Algorithm Details

• DP beam search – generate hypotheses with increasing cardinality by expanding hypotheses with lower cardinality – recombine hypotheses with same state – expand only promising hypotheses • share computations between expansions, e.g. check for overlap, rest score computation, ... • early pruning stop expansion as soon as possible • expand most promising candidates first Zens: Improvements in DP Beam Search for Phrase-based SMT

11

IWSLT Oct, 2008

Rest Score Estimation

• estimated score of hypothesis completion (inspired by A*) • previous work: – [Och 02, Och & Ney 04] TM & LM per source position, distortion – [Koehn 03] TM & LM per source sequence, no distortion • here: comparison of – computation per position and per sequence – models: TM only; TM & LM; TM, LM & distortion


12

IWSLT Oct, 2008

Experimental Results • NIST Chinese-English large data task • TM: training data: 8 M sentence pairs, 250 M words phrase-based, word-based lexica, word / phrase penalty • LM: 4-gram, trained on 650 M words, SRILM [Stolcke 02] • reordering: distortion penalty, reordering window: 10 lexicalized reordering model [Zens & Ney 06] • evaluation: case-insensitive Bleu score (mt-eval) on NIST 2002 test set Zens: Improvements in DP Beam Search for Phrase-based SMT

13

IWSLT Oct, 2008

Effect of Search Errors 39 38 37 36 BLEU[%]

35 34 33 32 31 30 29 28 -75

-70

-65

-60 Model Score

-55

-50

-45

Translate test set with various pruning parameters settings. Model score averaged over whole test set (878 sentences). Zens: Improvements in DP Beam Search for Phrase-based SMT

14

IWSLT Oct, 2008

Rest Score Estimation


15

IWSLT Oct, 2008

Lexical vs. Coverage Hypotheses 39 38

BLEU[%]

37 Max. Cov. Hyps 1 4 16 64 256

36 35 34 33 32 1

4

16

64

256

Max. Number of Lex. Hyps per Cov. Hyp. Zens: Improvements in DP Beam Search for Phrase-based SMT

16

IWSLT Oct, 2008

37.5 37 36.5 36 35.5 35 34.5 34 33.5 33 100

BLEU[%]

BLEU[%]

Effect of Cube Pruning

Cube Pruning This Work 1000

10000

100000

1e+06

LM Expansions per Source Word

37.5 37 36.5 36 35.5 35 34.5 34 Cube Pruning 33.5 This Work 33 0.1 1

10

100

Translation Speed [words/second]

Numbers averaged over whole test set; vary beam sizes. Lexicalized reordering not used, just distortion penalty. Zens: Improvements in DP Beam Search for Phrase-based SMT

17

IWSLT Oct, 2008

1000

Comparison with Moses 38 37

BLEU[%]

36 35 34 33 32

Moses this work

31 0.0625

0.25

1

4

16

64

256

1024

Translation Speed [words per sec]

Same TM, LM, etc.; vary beam setting Lexicalized reordering not used, just distortion penalty. Zens: Improvements in DP Beam Search for Phrase-based SMT

18

IWSLT Oct, 2008

Summary & Conclusions

• Summary – detailed problem description – efficient solution – in-depth analysis • Conclusions – search important for good translation quality – rest score estimation allows for small beam sizes – distinction lexical vs. coverage hypothesis important – additional cube pruning not necessary – significantly faster than Moses Zens: Improvements in DP Beam Search for Phrase-based SMT

19

IWSLT Oct, 2008

THANK YOU FOR YOUR ATTENTION!


20

IWSLT Oct, 2008

References [Bellman 57] R. Bellman: Dynamic Programming. Princeton University Press, Princeton, NJ, 1957. [Chiang 05] D. Chiang: A Hierarchical Phrase-Based Model for Statistical Machine Translation. Proc. 43rd Annual Meeting of the Assoc. for Computational Linguistics (ACL), pp. 263–270, Ann Arbor, Michigan, June 2005. [Huang & Chiang 07] L. Huang, D. Chiang: Forest Rescoring: Faster Decoding with Integrated Language Models. Proc. 45th Annual Meeting of the Assoc. for Computational Linguistics (ACL), Prague, Czech Republic, June 2007. [Jelinek 98] F. Jelinek: Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1998. [Knight 99] K. Knight: Decoding Complexity in Word-Replacement Translation Models. Computational Linguistics, Vol. 25, No. 4, pp. 607–615, December 1999. Zens: Improvements in DP Beam Search for Phrase-based SMT

21

IWSLT Oct, 2008

[Koehn 03] P. Koehn: Noun Phrase Translation. Ph.D. thesis, University of Southern California, 2003. [Koehn & Hoang+ 07] P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantine, E. Herbst: Moses: Open Source Toolkit for Statistical Machine Translation. Proc. 45th Annual Meeting of the Assoc. for Computational Linguistics (ACL): Poster Session, pp. 177–180, Prague, Czech Republic, June 2007. [Koehn & Och+ 03] P. Koehn, F.J. Och, D. Marcu: Statistical Phrase-Based Translation. Proc. Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics Annual Meeting (HLT-NAACL), pp. 127–133, Edmonton, Canada, May/June 2003. [Moore & Quirk 07] R.C. Moore, C. Quirk: Faster Beam-Search Decoding for Phrasal Statistical Machine Translation. Proc. MT Summit XI, Copenhagen, Denmark, September 2007. [Och 02] F.J. Och: Statistical Machine Translation: From Single-Word Models to Alignment Templates. Ph.D. thesis, Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Aachen, Germany, October 2002. Zens: Improvements in DP Beam Search for Phrase-based SMT

22

IWSLT Oct, 2008

[Och & Ney 04] F.J. Och, H. Ney: The Alignment Template Approach to Statistical Machine Translation. Computational Linguistics, Vol. 30, No. 4, pp. 417–449, December 2004. [Och & Tillmann+ 99] F.J. Och, C. Tillmann, H. Ney: Improved Alignment Models for Statistical Machine Translation. Proc. Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP), pp. 20–28, College Park, MD, June 1999. [Stolcke 02] A. Stolcke: SRILM – An Extensible Language Modeling Toolkit. Proc. Int. Conf. on Spoken Language Processing (ICSLP), Vol. 2, pp. 901–904, Denver, CO, September 2002. [Tillmann 06] C. Tillmann: Efficient Dynamic Programming Search Algorithms for Phrase-Based SMT. Proc. Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing, pp. 9–16, New York City, NY, June 2006. [Tillmann & Ney 03] C. Tillmann, H. Ney: Word Reordering and a Dynamic Programming Beam Search Algorithm for Statistical Machine Translation. Computational Linguistics, Vol. 29, No. 1, pp. 97–133, March 2003.


23

IWSLT Oct, 2008

[Zens 08] R. Zens: Phrase-based Statistical Machine Translation: Models, Search, Training. Ph.D. thesis, Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Aachen, Germany, February 2008. [Zens & Ney 06] R. Zens, H. Ney: Discriminative Reordering Models for Statistical Machine Translation. Proc. Human Language Technology Conf. / North American Chapter of the Assoc. for Computational Linguistics Annual Meeting (HLT-NAACL): Workshop on Statistical Machine Translation, pp. 55–63, New York City, NY, June 2006. [Zens & Och+ 02] R. Zens, F.J. Och, H. Ney: Phrase-Based Statistical Machine Translation. Proc. M. Jarke, J. Koehler, G. Lakemeyer, editors, 25th German Conf. on Artificial Intelligence (KI2002), Vol. 2479 of Lecture Notes in Artificial Intelligence (LNAI), pp. 18–32, Aachen, Germany, September 2002. Springer Verlag.


24

IWSLT Oct, 2008

Improvements in DP beam search for phrase-based ...

Improvements in DP beam search for phrase-based ...

Suggest Documents

Search Improvements in Multirelational Learning

Return on Quality Improvements in Search Engine

Willingness to Pay for Water Quality Improvements in ... - AgEcon Search

Recent Improvements in Satellite Networks for Search and Rescue ...

LOOK-AHEAD TECHNIQUES FOR IMPROVED BEAM SEARCH

A DP based Search Algorithm for Statistical Machine Translation

The Scatter Search Based Algorithm for Beam Angle Optimization in ...

admissible stopping in viterbi beam search for unit ... - CiteSeerX

Local Search Algorithms for the Beam Angles' Selection Problem in ...

Confused.com drives search improvements and ... - Services - Google

Some Improvements for Monte-Carlo Tree Search, Game Description ...

Degree of consistent training: Improvements in search ... - Springer Link

Improvements of Search Error Risk Minimization in Viterbi ... - CiteSeerX

Food Safety Improvements Underway in China - AgEcon Search

Degree of consistent training: Improvements in search ... - Springer Link

Degree of consistent training: Improvements in search ... - Springer Link

Improvements in Neural-Network Training and Search ... - CiteSeerX

DP-MC210/DP-MB350 - Panasonic

Recovering Beam Search: enhancing the beam ... - Semantic Scholar

Analytic solutions for Dp branes in SFT

DP-01 DP-01FX DP-01FX/CD - Tascam

IMPROVEMENTS TO THE WEAK-POST W-BEAM GUARDRAIL

Quasifree bremsstrahlung in the dp\dp reaction above the ... - JuSER

networks utilization improvements for