Maximum-entropy model - Google Sites

0 downloads 225 Views 1MB Size Report
Test set: 264 sentences. Noisy-channel. 63.3. 50.247.1. 75.3. 64.162.1. 80.9. 72.069.5. Maximum EntropyMaximum Entropy w
Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approach Yuya Unno, Takashi Ninomiya, Yusuke Miyao and Jun’ichi Tsujii University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, Japan

I ntroduc tion

Method 2. Bottom-Up Method

Sentence compression is one of the summarization tasks. We compress an input sentence and create a new short grammatical sentence preserving its meaning. Yesterday I went to Tokyo by train I went to Tokyo  

We cannot learn some compression patterns using the previous method, because the two parse trees sometimes have different structures. Previous method S

We can only drop words

Input: sentence l Output: argmaxsP(s | l)

NP

VP

NP

Knight and Marcu’ s noisy-channel model [1] 1. Parse sentences in the training corpus 2. Compare the corresponding nodes of compressed and original parse trees from the root nodes 3. Estimate rewriting probabilities using count of applied CFG rules We revised this model in two points 1. Maximum-entropy model 2. Bottom-up method

DT The

PP on the table

N apple

S NP

NP

ADVP

Yesterday

VP V

I

PP

I

∏P

( rl , rs )∈R

exp

Left-most ‘ ADVP’

S

ADVP

NP

Yesterday

I

V

‘ Yesterday’ is removed

features  Mother

    

DT

N

The

apple

on the table

r ∈R '

NP

is red

DT

N

The

apple

NP

went to Tokyo

1 P( s | l ) = ∏ exp ∑ λi f i (rs , rl ) i ( rs ,rl )∈R Z  Depth

PP on the table

is red

Avg. length of original sentences: 23.8 Avg. length of compressed sentences: 12.5 Training set: 527 sentences Development set: 263 sentences Test set: 264 sentences F-measure Bigram F-measure BLEU score

80

I

VP

Extracted tree

70 60 50 40

75.3 63.3

30

50.2 47.1

20

80.9 64.1 62.1

72.0 69.5

S

PP

S NP

10

VP

Select the nodes which dominate the compressed sentence

Daughter nodes are corresponding

100 90

Probabilities depend on various features of a parse tree

node  Daughter nodes sequence  Daughter terminals that are removed

Compressed tree

Original tree

PP

(rl | rs )∏ Pcfg (r )

‘ S’is the root

V

apple

VP

We can easily introduce various features to the maximum-entropy model, such as the depth from the root node and which words are removed. Maximum entropy model

The

is red

E x pe rim e nta l R e s ult

went to Tokyo went to Tokyo Pexp(rl | rs): probability of rewriting rs to rl P ( s | l ) ∝ P (l | s ) P ( s ) P(l | s ) =

PP

S

NP

N

VP

NP

Probabilities only depend on mother and daughter nodes

rs

DT

{DT, N} is not a subsequence of {NP, PP}

Bottom-up method

Rewriting probabilities only depend on mother and daughter nonterminals in Knight and Marcu’s model.

S

is red

VP

In the bottom-up method, we only parse the original sentence, and extract a tree from the original parse tree.

Method 1. Maximum-Entropy Model

rl

S

NP

Original tree

A lgorithm

Knight and Marcu’ s Noisy-channel model

Daughter nodes are not corresponding

Noisy-channel Maximum EntropyMaximum Entropy with Bottom-up

VP V

Results of N-gram based evaluation

PP

Grammar

Importance

Human Noisy-channel

4.94 3.81

4.31 3.38

Maximum Entropy ME + Bottom-up

3.88 4.22

3.38 4.06

went to Tokyo

from the root  Left-most and right-most daughters  etc...

We used the same corpus as Knight and Marcu. We evaluated the results using F-measure and BLEU score [2], and human judgment. Our method exceeds the previous method in all evaluation criteria. Especially we obtained the highest score using the maximum entropy model with bottom-up method.

Results of human evaluation Grammar: Whether the output is grammatically correct  Importance: Whether the important words remain 

[1] K. Knight and D. Marcu. 2000. Statistics-Based Summarization - Step One: Sentence Compression. In Proc. of AAAI/IAAI' ‘00 [2] K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proc. of ACL'02.