Incremental Joint POS Tagging and Dependency ... - Google Sites

10 downloads 146 Views 1MB Size Report
sentence length. ◦ Solution: Incremental (shift-reduce) parsing with beam search. Partial state packing (DP with graph
Incremental Joint POS Tagging and Dependency Parsing in Chinese Jun Hatori (University of Tokyo) Takuya Matsuzaki (University of Tokyo) Yusuke Miyao (National Center of Informatics) Jun’ichi Tsujii (Microsoft Research Asia)

In IJCNLP-2011 Chiang Mai, Thailand ’11/11/11

Why Joint? 

Jointly solve tagging and dependency parsing (assuming gold segmentation) ◦ Traditional pipeline approach to POS tagging and dependency parsing may suffer from error propagation. ◦ Chinese POS tagging sometimes requires long-range syntactic information.  A noun or verb?  的: DEG (genitive marker) v.s. DEC

(complementizer)

2

Overview 

Joint POS tagging and dependency parsing ◦ First incremental approach  Simple extension of shift-reduce algorithm  Advantageous in computational efficiency

◦ Achieved the new state-of-the-art performance for Chinese tagging and parsing  Still competitive in speed to baseline systems  First positive tagging result in the joint approach

◦ Experiments based on Mandarin, but generally applicable to other languages as well.

Challenges for a Joint Model 

Computational complexity

◦ Search space increases with the factor of 𝑇 𝑁 , where T is the number of tags and N is the sentence length. ◦ Solution:  Incremental (shift-reduce) parsing with beam search  Partial state packing (DP with graph-structured stack)



Lack of look-ahead POS information ◦ POS tags of look-ahead words are undetermined when choosing the next action ◦ Solution:  Introduce a concept of “delayed features”

4

Baseline Tagger 

Trigram POS tagger ◦ Viterbi search using beam size of 16 ◦ Use features described in [Zhang & Clark, 2008] ◦ Standard pruning for Chinese  Tag dictionary: for frequent words, only considers POS tags that appear in the training "把"∘ ∘𝑞0𝑞.0𝑤. 𝑤==忘记 忘记∘ ∘𝑞0𝑞.0𝑡. 𝑡=="𝐀𝐑𝐆𝟏” "𝐕𝐕”

26

Experiment 

Penn Chinese Treebank 5 (CTB-5) ◦ assume gold segmentation for input



Baseline models ◦ Pipeline POS tagger and dependency parser  Baseline-Tagger: re-implementation of [Zhang & Clark 08]  Parser-HS: dependency parser by [Huang & Sagae 10]  Parser-ZN: dependency parser by [Zhang & Nivre 11]

◦ Third-order graph-based joint models by [Li+ 11] 

Joint models ◦ Joint-HS: joint model using features in Parser-HS ◦ Joint-ZN: joint model using features in Parser-ZN

27

Feature ablation results 



Delayed features, dynamic programming, and syntactic features improved parsing accuracy. DP is not effective for joint-ZN because of richer features. 82

81.5 81

default wo/delay wo/dp wo/syn

80.5 80 79.5

Joint-HS

Joint-ZN

28

Final result POS accuracy Pipeline-HS Pipeline-ZN

93.82 †

Joint-HS

94.01

Joint-ZN

93.94

  

Dependency UAS

Dependency Root accuracy

Speed (sentence/sec)

77.13

72.59

32.7

77.83

74.82

4.8

73.86

9.5

77.93

1.5



79.83

81.33



0.1–0.2% improvement on tagging accuracy 2.7–3.5% improvement on parsing accuracy Joint parsing takes ~3x time

Comparison POS Dependency Dependency accuracy UAS Root accuracy Joint Parser

Speed (sentence/sec)

Li-2011-v1-O2

93.08

80.84

75.80

1.7

Li-2011-v2-O3

92.80

80.79

75.84

0.3

Joint-HS

94.01

79.83

73.86

9.5

Joint-ZN

93.94

81.33

77.93

1.5

Analysis Resolved many POS ambiguities that critically affect syntactic structure.  Most of the increased error patterns are not critical for the syntactic structure. 

Related works 

Dual decomposition ◦ Rush et al. (2010) combine a constituency parser and a trigram POS tagger.



Graphical model ◦ Lee et al. (2011) solve morphological disambiguation and dependency parsing in morphologically-rich languages.



Graph-based model ◦ Li et al. (2011) built a third-order joint POS tagging and dependency parsing model, with finely-tuned pruning techniques. 32

Conclusion 



Proposed the first incremental framework for joint POS tagging and dependency parsing. Outperforms the pipeline and baseline models, and achieved the best accuracies for Chinese. ◦ Tagging of syntactically-influential POS tags are selectively improved. ◦ Still competitive in speed to baseline systems, and comparable to singleton parsers. 33

Suggest Documents