Trainable Pruning Model for ITG Parsing - Google Sites

0 downloads 139 Views 2MB Size Report
Dec 7, 2010 - pruning methods using the Berkeley annotated data.. We use the first 250 sentence pairs as training data a
Discriminative Pruning for Discriminative ITG Alignment Shujie Liu, Chi-Ho Li and Ming Zhou

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

2

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

7/12/2010

财政

accountable



to

the

负责

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

3

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

7/12/2010

财政

accountable



to

the

负责

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

4

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

the

负责

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

5

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

the

负责

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

7/12/2010

Structural Rules

Shujie Liu, Chi-Ho Li and Ming Zhou

6

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

Structural Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

7

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

X X→[XX]

X

X

X

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

8

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

x

Lexical Rules C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

X X→[XX]

X→

X

X

X

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

9

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

the

负责

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

x

x X X

X

X

X

X

X

X

Ɛ

负责



Ɛ

财政



Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

be

accountable

to

the

Financial

Secretary

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

10

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

B

Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

B→ B→ B→ B→ B→ B→

A

A

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

11

Why Pruning 

ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009).

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

12

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

13

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]

for each F-span [s, t] for each E-span [u,v]

[s/u,S/U ]

s/u

7/12/2010

n2

n2

try to find an optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).

[S/U,t/v]

S/U

n2

t/v

Shujie Liu, Chi-Ho Li and Ming Zhou

14

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]

for each F-span [s, t] for each E-span [u,v]

[s/u,S/U ]

s/u  

n2

n2

try to find a optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).

[S/U,t/v]

S/U

n2

t/v

The complexity for ITG parsing without pruning is O(n6) Take more than 1 hour to parse one sentence pair longer than 60

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

15

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]

for each F-span [s, t] for each E-span [u,v]

[s/u,S/U ]

s/u  



n2

n2

try to find a optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).

[S/U,t/v]

S/U

n2

t/v

The complexity for ITG parsing without pruning is O(n6) Take more than 1 hour to parse one sentence pair longer than 60

Pruning is necessary

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

16

Three Kinds of Pruning 

discard F-spans and/or E-spans.  

discards too many span pairs (empirically) highly harmful to alignment performance

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

17

Three Kinds of Pruning 

discard F-spans and/or E-spans.  



discards too many span pairs (empirically) highly harmful to alignment performance

discard some alignment for a span pair.  

= minimizing the beam size of each span pair i.e. K-Best parsing

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

18

Three Kinds of Pruning 

discard F-spans and/or E-spans.  



discard some alignment for a span pair.  



discards too many span pairs (empirically) highly harmful to alignment performance = minimizing the beam size of each span pair i.e. K-Best parsing

discard some unpromising span pairs.  

i.e. limit E-spans per F-span It’s what our research is about.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

19

Related Work 

Tic-tac-toe pruning (Zhang and Gildea, 2005) 

Inside and outside scores to prune candidate E-spans for each F-span

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

20

Related Work 

Tic-tac-toe pruning (Zhang and Gildea, 2005) 



Inside and outside scores to prune candidate E-spans for each F-span

Tree constraints pruning (Cherry and Lin, 2006) invalid spans = spans interrupting the phrases of dependency tree i.e. [x1, j] and [j,x2]

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

21

Related Work 

Tic-tac-toe pruning (Zhang and Gildea, 2005) 



Inside and outside scores to prune candidate E-spans for each F-span

Tree constraints pruning (Cherry and Lin, 2006) invalid spans = spans interrupting the phrases of dependency tree i.e. [x1, j] and [j,x2]



High-precision alignments pruning (Haghighi et al., 2009) 



Prune all bitext cells that would invalidate more than 8 of high-precision alignments

1-1 alignment posterior pruning (Haghighi et al., 2009) 

Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

22

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

23

Linear Model 

As all these techniques have certain contribution in making good pruning decision, we try to incorporate all these features in ITG pruning.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

24

Linear Model 



As all these techniques have certain contribution in making good pruning decision, we try to incorporate all these features in ITG pruning. DPDI: Discriminative Pruning for Discriminative ITG

P (e | f ) 

exp(  (e , f )) ' exp(    ( e , f )) 

e ' T

λ: Feature weights

7/12/2010

ᴪ : Features

Shujie Liu, Chi-Ho Li and Ming Zhou

25

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

26

Training Sample Extraction 

Training samples? 

consist of various F-spans and their corresponding E-spans. 书 the book 书 the book 就会 is to 书 就会 the book is to 书 就会 来 the book is to come …….

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

27

Training Sample Extraction 

Training samples?  

consist of various F-spans and their corresponding E-spans. extracted from word alignment annotation

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

28

Training Sample Extraction 

Training samples?  

consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation



就会

the

book

is

7/12/2010

来 的

to

come

Shujie Liu, Chi-Ho Li and Ming Zhou

29

Training Sample Extraction 

Training samples?  

consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation



就会

the

book

is

来 的

to

come

ITG Constraints





就会

the

book

is

7/12/2010

来 的

to

come

Shujie Liu, Chi-Ho Li and Ming Zhou

30

Training Sample Extraction 

Training samples?  

consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation



就会

the

book

is

来 的

to

come

书 就会 来 的 the book is to come 书 就会 来 the book is to come

ITG Constraints

the



book

就会

is

书 就会 the book is to

来 的

to

come

书 the book 书 the book



7/12/2010

Ɛ the



就会 is to

书 book

Shujie Liu, Chi-Ho Li and Ming Zhou

Ɛ is

就会 to

来 come

的 Ɛ



31

Training Sample Extraction 

Training samples?  

consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation 书 就会 来 的 the book is to come

书 the book 书 the book 就会 is to 书 就会 the book is to 书 就会 来 the book is to come

书 就会 来 the book is to come 书 就会 the book is to 书 the book 书 the book

Ɛ the

7/12/2010

书 book



就会 is to

……. Ɛ is

就会 to

来 come

的 Ɛ



Shujie Liu, Chi-Ho Li and Ming Zhou

32

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

33

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-Best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000).

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

34

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000). loss: -1

[1,3]

[1,2]

[2,3]

[2,2]

F-span

0

1

2

7/12/2010

….



….

10

Shujie Liu, Chi-Ho Li and Ming Zhou

….



….

index 35

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000).

loss: -100,000

[1,7]

[1,8]

[1,9]

[2,7]

F-span

0

1

2

7/12/2010

…. …

…. 10

Shujie Liu, Chi-Ho Li and Ming Zhou

[1,10]

….



index 36

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise Rationale : keep as many correct E-spans as possible in the N-best lists, and push the correct E-spans upward as much as possible

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000). loss: -1

[1,3]

[1,2]

[2,3]

[2,2]

….

….

….

…. loss: -100,000

[1,7]

[1,8]

[1,9]

[2,7]

F-span

0

1

2

7/12/2010

…. …

…. 10

Shujie Liu, Chi-Ho Li and Ming Zhou

[1,10]

….



index 37

MERT: Minimum Error Rate Training 

Training method is much similar with MERT for SMT 









An important part of MERT for SMT is a linear search, which is a search for a best point given a fixed dimension. Bleu score are changed while the best candidate changes The changed best candidates form the upper envelope (red curved line) The changed points are interval boundaries (green points) Finding the interval boundaries are important for Normal MERT

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

Normal MERT

38

MERT: Minimum Error Rate Training 



Training method is much similar with MERT for SMT Difference: 

Instead of finding the interval boundaries at which the optimal candidate changes, we will find the interval boundaries at which the index of the correct result changes.

Normal MERT

interval boundaries : the red points, which are the intersections between the correct E-span and all other candidate E-spans.

golden

Modified MERT 7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

39

MERT: Minimum Error Rate Training score 



Training method is much similar with MERT for SMT Difference: 



Instead of finding the interval boundaries at which the optimal candidate changes, we will find the interval boundaries at which the index of the correct result changes. And the performance gain will be calculated between loss before the interval boundaries and loss after that

golden wi index -8 -9

index

loss

1

-10

-1

+1 -1

wi loss -8 -9

boundaries

+1

-10

+1

N = 10

-1

-99991 -99991 +99991

wi

-100000

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

Modified MERT

40

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

41

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



Alignment invalid count Ratio 



2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

42

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

43

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



Similar with Haghighi ’s

2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

44

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

Similar with Haghighi ’s

2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

ratio of span length ≈ 1.15 : (average ratio of sentence length)

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

45

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



Similar with Haghighi ’s

2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

|(j-i) /(m-l)-1.15|

Position Ratio 

ratio of span length ≈ 1.15 : (average ratio of sentence length)

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))| monotonic assumption:{ Position(F-span) ≈ Position(E-span)}

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

46

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

47

Small-scale alignment Evaluation 





 

The first set of experiments evaluates the performance of the three pruning methods using the Berkeley annotated data. We use the first 250 sentence pairs as training data and the rest 241 pairs as testing data. The corresponding numbers of E-spans in training and test data are 4590 and 3951 respectively. Two ITG models are used: W-DITG and HP-DITG. The upper-bound , actual F-score and the time cost are compared.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

48

Small-scale alignment Evaluation ID

pruning

beam size

pruning/ (total time cost)

F-score Upper Bound

F-score

1

DPDI

10

72’’/3’03’’

88.5%

82.5%

2

TTT

10

58’’/2’38’’

87.5%

81.1%

3

TTT

20

53’’/6’55’’

88.6%

82.4%

4

DP

--

11’’/6’01’’

86.1%

80.5%

Table 1: Evaluation of DPDI against TTT (Tic-tac-toe) and DP (Dynamic Program) for W-DITG

• With the same beam size, although DPDI spends a bit more time, in terms of F-score upper bound, DPDI is 1 percent higher. • DPDI achieves even larger improvement in actual F-score.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

49

Small-scale alignment Evaluation ID

pruning

beam size

pruning/ (total time cost)

F-score Upper Bound

F-score

1

DPDI

10

72’’/5’18’’

93.9%

87.0%

2

TTT

10

58’’/4’51’’

93.0%

84.8%

3

TTT

20

53’’/12’5’’

94.0%

86.5%

4

DP

--

11’’/15’39’’

91.4%

83.6%

Table 2: Evaluation of DPDI against TTT (Tic-tac-toe) and DP (Dynamic Program) for HP-DITG

• Roughly the same observation as in W-DITG can be made. • In addition to the superiority of DPDI, it can also be noted that HPDITG achieves much higher F-score and F-score upper bound (For more details, please read our Coling2010 paper).

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

50

Large-scale End-to-End Experiment 

Machine translation evaluation 

Bilingual Training data: 



Language model: 



5-gram language model trained from the Xinhua section of the Gigaword corpus

Develop corpus 



the NIST training set excluding the Hong Kong Law and Hong Kong Hansard

NIST’03 test set

Test corpus 

7/12/2010

Nist’05 and Nist’08 test sets

Shujie Liu, Chi-Ho Li and Ming Zhou

51

Large-scale End-to-End Experiment ID

Prun-ing

beam size

time cost

Bleu-05

Bleu-08

1

DPDI

10

1092h

38.57

28.31

2

TTT

10

972h

37.96

27.37

3

TTT

20

2376h

38.13

27.58

4

DP

--

2068h

37.43

27.12

Table 3: Evaluation of DPDI against TTT and DP for HP-DITG 





HP-DITG using DPDI achieves the best Bleu score with acceptable time cost. An explanation of the better performance by HP-DITG is the better phrase pair extraction due to DPDI. Good ITG pruning like DPDI guides the subsequent ITG alignment process so that less links inconsistent to good phrase pairs are produced.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

52

Large-scale End-to-End Experiment Prun-ing

F-Score

Bleu-05

3

HMM Giza++ BITG

80.1% 84.2% 85.9%

36.91 37.70 37.92

26.86 27.33 27.85

4

W-DITG

82.5%

--

--

5

HP-DITG

87.0%

38.57

28.31

ID 1 2

Bleu-08

Table 4: Evaluation of DPDI against HMM, Giza++ and BITG 



W-DITG is not as good as HMM, Giza++ and BITG, since W-DITG suffers from the 1-to-1 alignment constraints. HP-DITG (with DPDI) is better than the three baselines both in alignment F-score and Bleu score.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

53

Summary 





A discriminative pruning method (DPDI) is proposed, which can use Minimum Error Rate Training and various features. DPDI is an effective way to reduce the number of bitext cells for bilingual parsing. DPDI can improve not only the alignment performance, but also the SMT performance.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

54

Thanks

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

55

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

56

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

A

A

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

57

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

B

Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

B→ B→ B→ B→ B→ B→

A

A

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

58

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

S

Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

B

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

B→ B→ B→ B→ B→ B→

A

A

S→A S→B S→C

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

59

Training Sample Extraction 

The annotated data for training? 

e1

We use the phrase pair extracted from golden alignment sentence pairs as annotated data for training e2

f1

e3

A: [e1,e3]/[f1,f2] {e1/f1,e3/f2},{e2/f1,e3/f2} A  [C, C ]

f2

A  [C, C ]

[f1,f2]

f2 C: [e1,e2]/[f1] {e2/f1}

C  [Ce , Cw ] Cw: e1/f1 {e1/f1}

7/12/2010

Ce: e1/Ɛ

Cw: e2/f1 {e2/f1}

C: [e2,e3]/[f2] {e3/f2}

[e2,e3] e3 [e1,e2]

f1

C  [Ce , Cw ] Ce: e2/Ɛ

[e2,e3]

e2 e1

Cw: e3/f2 {e3/fe}

Shujie Liu, Chi-Ho Li and Ming Zhou

60

Evaluation Criteria 

The upper bound on alignment F-score 

how many links in annotated alignment can be kept in ITG parse A: [e1,e3]/[f1,f2] hit=max{1+1,1+1}=2

1 if  u, v  R hit (Cw [u, v])   0 otherwise

hit (Ce )  0

A  [C, C ]

hit (C f )  0

hit ( X [ f , e ])  max

Y , Z , f1 ,e1 , f 2 ,e2

7/12/2010

C: [e1,e2]/[f1] hit=max{0+1}=1

(hit (Y [ f1 , e1 ])  hit (Y [ f 2 , e2 ]))

where X,Y,Z are variables for the categories in ITG grammar, and R comprises the golden links in annotated alignment.

A  [C, C ]

C: [e2,e3]/[f2] hit=max{0+1}=1

C  [Ce , Cw ]

Cw: e1/f1 hit=1

Ce: e1/Ɛ hit=0

Shujie Liu, Chi-Ho Li and Ming Zhou

C  [Ce , Cw ] Cw: e2/f1 hit=1

Ce: e2/Ɛ hit=0

Cw: e3/f2 hit=1

61

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Length Ratio 



3

4

1

2

3

4

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))| = |4/(2*4) - 5/(2*4)| = 0.125

Alignment count Ratio 



2

Position Ratio 



|(j-i) /(m-l)-1.15| = |(3-1)/(4-2)-1.15| = |-0.15| = 0.15

1

2*Count(Links in this span pair) / (j-i+m-l) = 2*1/4 = 0.5

Alignment invalid count Ratio 

7/12/2010

2*Count(links linked to outside)/(j-i+m-l) = 2*2/4 = 1

Shujie Liu, Chi-Ho Li and Ming Zhou

62

Suggest Documents