Dependency Parser Adaptation with Subtrees from Auto

5 downloads 38008 Views 309KB Size Report
The new features are based on subtrees extracted from the auto-parsed target domain ... Retrain the parser with the union of the labeled training data in the two.
Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data Xuezhe Ma Fei Xia University of Washington Introduction • We propose a feature augmentation approach to domain adaptation for dependency parsing. • The new features are based on subtrees extracted from the auto-parsed target domain data. • The approach focuses on the fully supervised setting in which a small amount of labeled data in the target domain is available. • When tested on three source-target domain pairs (i.e., WSJ-Brown, BrownWSJ, WSJ-Genia), our approach outperforms several baselines and existing approaches.

Subtree Extraction & Feature Augmentation Our approach is inspired by the subtree extraction method in (Chen et al. 2009), and the feature augmentation method in (Daume III, 2007).

Subtree Extraction in (Chen et al., 2009): •

Extract all the subtrees in auto-parsed data and store them in a list Lst



Count the frequency of these subtrees and divide them into three groups according to their levels of frequency.



Construct new features that indicate which groups the subtree for the current dependency pair belong to.

Feature Augmentation in Daume III (2007): Daume III (2007) did feature augmentation by defining the following mappings: s  ( xorg )  xorg ,0   ( xorg )  xorg , xorg  t

where xorg is the original feature vector and 0= is the zero vector.

Our Approach for Parsing Adaptation

Results

In this study, we combine the idea of Chen et al. (2009) and Daume III (2007), and extend that for domain adaptation. We accomplish that by redefining the mappings as follows:

WSJ-to-B B-to-WSJ WSJ-to-G 1st Ord 2nd Ord 1st Ord 2nd Ord 1st Ord UAS CM UAS CM UAS CM UAS CM UAS LAS

 ( xorg )  xorg ,0 

SrcOnly

88.8

43.8 89.8 47.3 86.3

26.5 88.0 30.4 83.8 82.0

 ( xorg )  xorg , xnew 

TgtOnly

86.6

38.8 87.7 42.2 88.2

29.3 89.7 34.2 87.0 85.7

Src&Tgt

89.1

44.3 90.2 48.2 89.4

31.2 90.9 36.6 87.2 85.9

Self-Training

89.2

45.1 90.3 48.8 89.8

32.1 91.0 37.1 87.3 86.0

Co-Training

89.2

45.1 90.3 48.5 89.8

32.7 90.9 38.0 87.3 86.0

Feature-Aug

89.1

45.1 90.0 48.4 89.8

32.8 91.0 37.4 87.9 86.5

Chen (2009)

89.3

45.0 90.3 49.1 89.7

31.8 91.0 37.6 87.5 86.2

This paper

89.5

45.5 90.6 49.6 90.2

33.4 91.5 38.8 88.4 87.1

Plank (2011)

-

s

t

where xorg is the original feature vector and xnew is the vector of the subtreebased features extracted from auto-parsed data of the target domain. The subtree extraction method used in our approach is the same as in (Chen et al., 2009). The steps of our approach are as follows: 1. Train a baseline parser with the small amount of labeled data in the target domain and use the parser to parse the large amount of unlabeled sentences in the target domain. 2. Extract subtrees from the auto-parsed data and add subtree-based features to the labeled training data in the target domain. 3. Retrain the parser with the union of the labeled training data in the two domains, where the instances from the target domain are augmented with the subtree-based features.

Per-corpus

-

-

47.0 91.1 51.1 82.7

-

-

-

-

86.8

42.1 93.6 47.9 90.5 89.7

Analysis and Conclusion TgtOnly Src&Tgt

test 2,429 2,416 1,360

Table 1. The number of sentences for each data set used in our experiments.

Parsing Models: 1st and 2nd graph-based parsing models (McDonald et at., 2005; McDonald and Pereira, 2006).

ACL 2013

89.9

-

Table 2. Parsing results on three data sets. UAS is unlabeled attachment score, CM is the percentage of sentences with complete match, and LAS is labeled attachment score.

Experiments Source Target training training unlabeled WSJ-to-B 39,832 2,182 19,632 B-to-WSJ 21,814 2,097 37,735 WSJ-to-G 39,279 1,024 13,302

-

Sofia, Bulgaria

TgtOnly

88.4/87.1 88.4/87.1

Src&Tgt

87.6/86.3 87.5/86.2

Table 3. Performance (UAS/LAS) of the final parser for WSJ-to-Genia when different training data are used to create the final parser. The column label shows the dataset used in Step 1, and the row label indicates the dataset to which subtree-based features are added in Step 3 of our approach.

Conclusion: • Subtree-based features collected from unlabeled data helps. • Adding those features to the target domain only is better than adding them to both domains.

Suggest Documents