Dec 7, 2010 - pruning methods using the Berkeley annotated data.. We use the first 250 sentence pairs as training data a
Discriminative Pruning for Discriminative ITG Alignment Shujie Liu, Chi-Ho Li and Ming Zhou
Outline
Introduction Discriminative Model for Pruning
Training Sample Extraction Training : MERT Features
Experiments and Analysis
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
2
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
7/12/2010
财政
accountable
司
to
the
负责
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
3
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
7/12/2010
财政
accountable
司
to
the
负责
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
4
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
the
负责
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
5
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
the
负责
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules
7/12/2010
Structural Rules
Shujie Liu, Chi-Ho Li and Ming Zhou
6
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules
Structural Rules
C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
7
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
Structural Rules
X X→[XX]
X
X
X
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
8
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.
x
Lexical Rules C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
Structural Rules
X X→[XX]
X→
X
X
X
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
9
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
the
负责
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.
x
x X X
X
X
X
X
X
X
Ɛ
负责
向
Ɛ
财政
司
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
be
accountable
to
the
Financial
Secretary
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
10
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.
B
Lexical Rules
C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
Structural Rules
A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]
B→ B→ B→ B→ B→ B→
A
A
A
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
11
Why Pruning
ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009).
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
12
Why Pruning
ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
13
Why Pruning
ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]
for each F-span [s, t] for each E-span [u,v]
[s/u,S/U ]
s/u
7/12/2010
n2
n2
try to find an optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).
[S/U,t/v]
S/U
n2
t/v
Shujie Liu, Chi-Ho Li and Ming Zhou
14
Why Pruning
ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]
for each F-span [s, t] for each E-span [u,v]
[s/u,S/U ]
s/u
n2
n2
try to find a optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).
[S/U,t/v]
S/U
n2
t/v
The complexity for ITG parsing without pruning is O(n6) Take more than 1 hour to parse one sentence pair longer than 60
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
15
Why Pruning
ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]
for each F-span [s, t] for each E-span [u,v]
[s/u,S/U ]
s/u
n2
n2
try to find a optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).
[S/U,t/v]
S/U
n2
t/v
The complexity for ITG parsing without pruning is O(n6) Take more than 1 hour to parse one sentence pair longer than 60
Pruning is necessary
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
16
Three Kinds of Pruning
discard F-spans and/or E-spans.
discards too many span pairs (empirically) highly harmful to alignment performance
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
17
Three Kinds of Pruning
discard F-spans and/or E-spans.
discards too many span pairs (empirically) highly harmful to alignment performance
discard some alignment for a span pair.
= minimizing the beam size of each span pair i.e. K-Best parsing
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
18
Three Kinds of Pruning
discard F-spans and/or E-spans.
discard some alignment for a span pair.
discards too many span pairs (empirically) highly harmful to alignment performance = minimizing the beam size of each span pair i.e. K-Best parsing
discard some unpromising span pairs.
i.e. limit E-spans per F-span It’s what our research is about.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
19
Related Work
Tic-tac-toe pruning (Zhang and Gildea, 2005)
Inside and outside scores to prune candidate E-spans for each F-span
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
20
Related Work
Tic-tac-toe pruning (Zhang and Gildea, 2005)
Inside and outside scores to prune candidate E-spans for each F-span
Tree constraints pruning (Cherry and Lin, 2006) invalid spans = spans interrupting the phrases of dependency tree i.e. [x1, j] and [j,x2]
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
21
Related Work
Tic-tac-toe pruning (Zhang and Gildea, 2005)
Inside and outside scores to prune candidate E-spans for each F-span
Tree constraints pruning (Cherry and Lin, 2006) invalid spans = spans interrupting the phrases of dependency tree i.e. [x1, j] and [j,x2]
High-precision alignments pruning (Haghighi et al., 2009)
Prune all bitext cells that would invalidate more than 8 of high-precision alignments
1-1 alignment posterior pruning (Haghighi et al., 2009)
Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
22
Outline
Introduction Discriminative Model for Pruning
Training Sample Extraction Training : MERT Features
Experiments and Analysis
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
23
Linear Model
As all these techniques have certain contribution in making good pruning decision, we try to incorporate all these features in ITG pruning.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
24
Linear Model
As all these techniques have certain contribution in making good pruning decision, we try to incorporate all these features in ITG pruning. DPDI: Discriminative Pruning for Discriminative ITG
P (e | f )
exp( (e , f )) ' exp( ( e , f ))
e ' T
λ: Feature weights
7/12/2010
ᴪ : Features
Shujie Liu, Chi-Ho Li and Ming Zhou
25
Outline
Introduction Discriminative Model for Pruning
Training Sample Extraction Training : MERT Features
Experiments and Analysis
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
26
Training Sample Extraction
Training samples?
consist of various F-spans and their corresponding E-spans. 书 the book 书 the book 就会 is to 书 就会 the book is to 书 就会 来 the book is to come …….
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
27
Training Sample Extraction
Training samples?
consist of various F-spans and their corresponding E-spans. extracted from word alignment annotation
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
28
Training Sample Extraction
Training samples?
consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation
书
就会
the
book
is
7/12/2010
来 的
to
come
Shujie Liu, Chi-Ho Li and Ming Zhou
29
Training Sample Extraction
Training samples?
consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation
书
就会
the
book
is
来 的
to
come
ITG Constraints
书
就会
the
book
is
7/12/2010
来 的
to
come
Shujie Liu, Chi-Ho Li and Ming Zhou
30
Training Sample Extraction
Training samples?
consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation
书
就会
the
book
is
来 的
to
come
书 就会 来 的 the book is to come 书 就会 来 the book is to come
ITG Constraints
the
书
book
就会
is
书 就会 the book is to
来 的
to
come
书 the book 书 the book
7/12/2010
Ɛ the
的
就会 is to
书 book
Shujie Liu, Chi-Ho Li and Ming Zhou
Ɛ is
就会 to
来 come
的 Ɛ
31
Training Sample Extraction
Training samples?
consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation 书 就会 来 的 the book is to come
书 the book 书 the book 就会 is to 书 就会 the book is to 书 就会 来 the book is to come
书 就会 来 the book is to come 书 就会 the book is to 书 the book 书 the book
Ɛ the
7/12/2010
书 book
的
就会 is to
……. Ɛ is
就会 to
来 come
的 Ɛ
Shujie Liu, Chi-Ho Li and Ming Zhou
32
Outline
Introduction Discriminative Model for Pruning
Training Sample Extraction Training : MERT Features
Experiments and Analysis
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
33
Loss Function M
loss ( rs , eˆ ( f s ; 1 ))
rank( rs ) penalty
if rs eˆ ( f s ;1M ) otherwise
fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-Best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000).
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
34
Loss Function M
loss ( rs , eˆ ( f s ; 1 ))
rank( rs ) penalty
if rs eˆ ( f s ;1M ) otherwise
fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000). loss: -1
[1,3]
[1,2]
[2,3]
[2,2]
F-span
0
1
2
7/12/2010
….
…
….
10
Shujie Liu, Chi-Ho Li and Ming Zhou
….
…
….
index 35
Loss Function M
loss ( rs , eˆ ( f s ; 1 ))
rank( rs ) penalty
if rs eˆ ( f s ;1M ) otherwise
fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000).
loss: -100,000
[1,7]
[1,8]
[1,9]
[2,7]
F-span
0
1
2
7/12/2010
…. …
…. 10
Shujie Liu, Chi-Ho Li and Ming Zhou
[1,10]
….
…
index 36
Loss Function M
loss ( rs , eˆ ( f s ; 1 ))
rank( rs ) penalty
if rs eˆ ( f s ;1M ) otherwise Rationale : keep as many correct E-spans as possible in the N-best lists, and push the correct E-spans upward as much as possible
fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000). loss: -1
[1,3]
[1,2]
[2,3]
[2,2]
….
….
….
…. loss: -100,000
[1,7]
[1,8]
[1,9]
[2,7]
F-span
0
1
2
7/12/2010
…. …
…. 10
Shujie Liu, Chi-Ho Li and Ming Zhou
[1,10]
….
…
index 37
MERT: Minimum Error Rate Training
Training method is much similar with MERT for SMT
An important part of MERT for SMT is a linear search, which is a search for a best point given a fixed dimension. Bleu score are changed while the best candidate changes The changed best candidates form the upper envelope (red curved line) The changed points are interval boundaries (green points) Finding the interval boundaries are important for Normal MERT
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
Normal MERT
38
MERT: Minimum Error Rate Training
Training method is much similar with MERT for SMT Difference:
Instead of finding the interval boundaries at which the optimal candidate changes, we will find the interval boundaries at which the index of the correct result changes.
Normal MERT
interval boundaries : the red points, which are the intersections between the correct E-span and all other candidate E-spans.
golden
Modified MERT 7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
39
MERT: Minimum Error Rate Training score
Training method is much similar with MERT for SMT Difference:
Instead of finding the interval boundaries at which the optimal candidate changes, we will find the interval boundaries at which the index of the correct result changes. And the performance gain will be calculated between loss before the interval boundaries and loss after that
golden wi index -8 -9
index
loss
1
-10
-1
+1 -1
wi loss -8 -9
boundaries
+1
-10
+1
N = 10
-1
-99991 -99991 +99991
wi
-100000
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
Modified MERT
40
Outline
Introduction Discriminative Model for Pruning
Training Sample Extraction Training : MERT Features
Experiments and Analysis
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
41
Features
Features for pruning model (F-span:, E-span)
Inside probability Outside probability Alignment count Ratio
Alignment invalid count Ratio
2*Count(links linked to outside)/(j-i+m-l)
Length Ratio
2*Count(Links in this span pair) / (j-i+m-l)
|(j-i) /(m-l)-1.15|
Position Ratio
7/12/2010
|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|
Shujie Liu, Chi-Ho Li and Ming Zhou
42
Features
Features for pruning model (F-span:, E-span)
Inside probability Outside probability Alignment count Ratio
2*Count(links linked to outside)/(j-i+m-l)
Length Ratio
2*Count(Links in this span pair) / (j-i+m-l)
Alignment invalid count Ratio
Tic-Tac-Toe
|(j-i) /(m-l)-1.15|
Position Ratio
7/12/2010
|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|
Shujie Liu, Chi-Ho Li and Ming Zhou
43
Features
Features for pruning model (F-span:, E-span)
Inside probability Outside probability Alignment count Ratio
Similar with Haghighi ’s
2*Count(links linked to outside)/(j-i+m-l)
Length Ratio
2*Count(Links in this span pair) / (j-i+m-l)
Alignment invalid count Ratio
Tic-Tac-Toe
|(j-i) /(m-l)-1.15|
Position Ratio
7/12/2010
|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|
Shujie Liu, Chi-Ho Li and Ming Zhou
44
Features
Features for pruning model (F-span:, E-span)
Inside probability Outside probability Alignment count Ratio
|(j-i) /(m-l)-1.15|
Position Ratio
7/12/2010
Similar with Haghighi ’s
2*Count(links linked to outside)/(j-i+m-l)
Length Ratio
2*Count(Links in this span pair) / (j-i+m-l)
Alignment invalid count Ratio
Tic-Tac-Toe
ratio of span length ≈ 1.15 : (average ratio of sentence length)
|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|
Shujie Liu, Chi-Ho Li and Ming Zhou
45
Features
Features for pruning model (F-span:, E-span)
Inside probability Outside probability Alignment count Ratio
Similar with Haghighi ’s
2*Count(links linked to outside)/(j-i+m-l)
Length Ratio
2*Count(Links in this span pair) / (j-i+m-l)
Alignment invalid count Ratio
Tic-Tac-Toe
|(j-i) /(m-l)-1.15|
Position Ratio
ratio of span length ≈ 1.15 : (average ratio of sentence length)
|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))| monotonic assumption:{ Position(F-span) ≈ Position(E-span)}
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
46
Outline
Introduction Discriminative Model for Pruning
Training Sample Extraction Training : MERT Features
Experiments and Analysis
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
47
Small-scale alignment Evaluation
The first set of experiments evaluates the performance of the three pruning methods using the Berkeley annotated data. We use the first 250 sentence pairs as training data and the rest 241 pairs as testing data. The corresponding numbers of E-spans in training and test data are 4590 and 3951 respectively. Two ITG models are used: W-DITG and HP-DITG. The upper-bound , actual F-score and the time cost are compared.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
48
Small-scale alignment Evaluation ID
pruning
beam size
pruning/ (total time cost)
F-score Upper Bound
F-score
1
DPDI
10
72’’/3’03’’
88.5%
82.5%
2
TTT
10
58’’/2’38’’
87.5%
81.1%
3
TTT
20
53’’/6’55’’
88.6%
82.4%
4
DP
--
11’’/6’01’’
86.1%
80.5%
Table 1: Evaluation of DPDI against TTT (Tic-tac-toe) and DP (Dynamic Program) for W-DITG
• With the same beam size, although DPDI spends a bit more time, in terms of F-score upper bound, DPDI is 1 percent higher. • DPDI achieves even larger improvement in actual F-score.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
49
Small-scale alignment Evaluation ID
pruning
beam size
pruning/ (total time cost)
F-score Upper Bound
F-score
1
DPDI
10
72’’/5’18’’
93.9%
87.0%
2
TTT
10
58’’/4’51’’
93.0%
84.8%
3
TTT
20
53’’/12’5’’
94.0%
86.5%
4
DP
--
11’’/15’39’’
91.4%
83.6%
Table 2: Evaluation of DPDI against TTT (Tic-tac-toe) and DP (Dynamic Program) for HP-DITG
• Roughly the same observation as in W-DITG can be made. • In addition to the superiority of DPDI, it can also be noted that HPDITG achieves much higher F-score and F-score upper bound (For more details, please read our Coling2010 paper).
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
50
Large-scale End-to-End Experiment
Machine translation evaluation
Bilingual Training data:
Language model:
5-gram language model trained from the Xinhua section of the Gigaword corpus
Develop corpus
the NIST training set excluding the Hong Kong Law and Hong Kong Hansard
NIST’03 test set
Test corpus
7/12/2010
Nist’05 and Nist’08 test sets
Shujie Liu, Chi-Ho Li and Ming Zhou
51
Large-scale End-to-End Experiment ID
Prun-ing
beam size
time cost
Bleu-05
Bleu-08
1
DPDI
10
1092h
38.57
28.31
2
TTT
10
972h
37.96
27.37
3
TTT
20
2376h
38.13
27.58
4
DP
--
2068h
37.43
27.12
Table 3: Evaluation of DPDI against TTT and DP for HP-DITG
HP-DITG using DPDI achieves the best Bleu score with acceptable time cost. An explanation of the better performance by HP-DITG is the better phrase pair extraction due to DPDI. Good ITG pruning like DPDI guides the subsequent ITG alignment process so that less links inconsistent to good phrase pairs are produced.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
52
Large-scale End-to-End Experiment Prun-ing
F-Score
Bleu-05
3
HMM Giza++ BITG
80.1% 84.2% 85.9%
36.91 37.70 37.92
26.86 27.33 27.85
4
W-DITG
82.5%
--
--
5
HP-DITG
87.0%
38.57
28.31
ID 1 2
Bleu-08
Table 4: Evaluation of DPDI against HMM, Giza++ and BITG
W-DITG is not as good as HMM, Giza++ and BITG, since W-DITG suffers from the 1-to-1 alignment constraints. HP-DITG (with DPDI) is better than the three baselines both in alignment F-score and Bleu score.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
53
Summary
A discriminative pruning method (DPDI) is proposed, which can use Minimum Error Rate Training and various features. DPDI is an effective way to reduce the number of bitext cells for bilingual parsing. DPDI can improve not only the alignment performance, but also the SMT performance.
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
54
Thanks
7/12/2010
Shujie Liu, Chi-Ho Li and Ming Zhou
55
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules
C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
56
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules
C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
Structural Rules
A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]
A
A
A
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
57
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.
B
Lexical Rules
C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
Structural Rules
A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]
B→ B→ B→ B→ B→ B→
A
A
A
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
58
Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向
be
财政
accountable
司
to
负责
the
Financial
Secretary
ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.
S
Lexical Rules
C→ei/fi C→Ɛ/fi C→ei/Ɛ
7/12/2010
B
Structural Rules
A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]
B→ B→ B→ B→ B→ B→
A
A
S→A S→B S→C
A
Ɛ
负责
向
Ɛ
财政
司
be
accountable
to
the
Financial
Secretary
Shujie Liu, Chi-Ho Li and Ming Zhou
59
Training Sample Extraction
The annotated data for training?
e1
We use the phrase pair extracted from golden alignment sentence pairs as annotated data for training e2
f1
e3
A: [e1,e3]/[f1,f2] {e1/f1,e3/f2},{e2/f1,e3/f2} A [C, C ]
f2
A [C, C ]
[f1,f2]
f2 C: [e1,e2]/[f1] {e2/f1}
C [Ce , Cw ] Cw: e1/f1 {e1/f1}
7/12/2010
Ce: e1/Ɛ
Cw: e2/f1 {e2/f1}
C: [e2,e3]/[f2] {e3/f2}
[e2,e3] e3 [e1,e2]
f1
C [Ce , Cw ] Ce: e2/Ɛ
[e2,e3]
e2 e1
Cw: e3/f2 {e3/fe}
Shujie Liu, Chi-Ho Li and Ming Zhou
60
Evaluation Criteria
The upper bound on alignment F-score
how many links in annotated alignment can be kept in ITG parse A: [e1,e3]/[f1,f2] hit=max{1+1,1+1}=2
1 if u, v R hit (Cw [u, v]) 0 otherwise
hit (Ce ) 0
A [C, C ]
hit (C f ) 0
hit ( X [ f , e ]) max
Y , Z , f1 ,e1 , f 2 ,e2
7/12/2010
C: [e1,e2]/[f1] hit=max{0+1}=1
(hit (Y [ f1 , e1 ]) hit (Y [ f 2 , e2 ]))
where X,Y,Z are variables for the categories in ITG grammar, and R comprises the golden links in annotated alignment.
A [C, C ]
C: [e2,e3]/[f2] hit=max{0+1}=1
C [Ce , Cw ]
Cw: e1/f1 hit=1
Ce: e1/Ɛ hit=0
Shujie Liu, Chi-Ho Li and Ming Zhou
C [Ce , Cw ] Cw: e2/f1 hit=1
Ce: e2/Ɛ hit=0
Cw: e3/f2 hit=1
61
Features
Features for pruning model (F-span:, E-span)
Inside probability Outside probability Length Ratio
3
4
1
2
3
4
|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))| = |4/(2*4) - 5/(2*4)| = 0.125
Alignment count Ratio
2
Position Ratio
|(j-i) /(m-l)-1.15| = |(3-1)/(4-2)-1.15| = |-0.15| = 0.15
1
2*Count(Links in this span pair) / (j-i+m-l) = 2*1/4 = 0.5
Alignment invalid count Ratio
7/12/2010
2*Count(links linked to outside)/(j-i+m-l) = 2*2/4 = 1
Shujie Liu, Chi-Ho Li and Ming Zhou
62