example, in the sentence â⦠bought one of town's two meat-packing plantsâ, .... appositive phrase âUS destroyer Coleâ, leading it to be misclassified as âART.
Tree Kernel-Based Semantic Relation Extraction using Unified Dynamic Relation Tree Longhua Qian Guodong Zhou Fang Kong Qiaomin Zhu Peide Qian* Jiangsu Provincial Key Lab for Computer Information Processing Technology School of Computer Science and Technology Soochow University, Suzhou China 215006 Email: {qianlonghua, gdzhou, kongfang, qmzhu, pdqian}@suda.edu.cn Abstract This paper proposes a Unified Dynamic Relation Tree (DRT) span for tree kernel-based semantic relation extraction between entity names. The basic idea is to apply a variety of linguistics-driven rules to dynamically prune out noisy information from a syntactic parse tree and include necessary contextual information. In addition, different kinds of entityrelated semantic information are unified into the syntactic parse tree. Evaluation on the ACE RDC 2004 corpus shows that the Unified DRT span outperforms other widely-used tree spans, and our system achieves comparable performance with the state-of-the-art kernel-based ones. This indicates that our method can not only well model the structured syntactic information but also effectively capture entity-related semantic information.
1. Introduction Information extraction is an active research topic in NLP. It attempts to identify and extract relevant information from a large amount of text documents available in digital archives and the WWW. According to the NIST ACE program, information extraction subsumes a broad range of tasks, including Entity Detection and Tracking (EDT), Relation Detection and Characterization (RDC), and Event Detection and Characterization (EDC). This paper focuses on the RDC task, which detects and classifies semantic relationships (usually of pre-specified types) between pairs of entities detected by the EDT task. For example, the sentence “Microsoft Corp. is based in Redmond, WA” conveys the relation “GPEAFF.Based” between “Microsoft Corp”(ORG) and “Redmond”(GPE). In addition to information *
Corresponding author
extraction, relation extraction is also very useful in many advanced NLP applications, such as question answering and text summarization. In the literature, feature-based methods have dominated the research in relation extraction. Featurebased methods [7][11] achieve promising performance and competitive efficiency by transforming a relation example to a set of syntactic and semantic features. However, detailed research [12] shows that it’s difficult to extract new effective features to further improve the extraction accuracy. Therefore, researchers turn to kernel-based methods, which avoids the burden of feature engineering through computing the similarity of two discrete objects (e.g. parse trees) directly. From prior work [1][5][9] to current research [10][13], kernel methods have been showing more and more potential in relation extraction. The key problem for kernel methods on relation extraction is how to represent and capture the structured syntactic information inherent in relation instances. While kernel methods using dependency tree [5] and the shortest dependency path [1] suffer from low recall performance, convolution tree kernels [10][13] over syntactic parse trees achieve competitive or even better performance than feature-based methods. However, there still exist some problems regarding widely used tree spans. Zhang et al. [10] investigate five tree spans for relation extraction, among which the Path-enclosed Tree (PT, the parse tree enclosed by the path connecting the two entities involved in a relationship) achieves the best performance. Zhou et al. [13] extend it to Context-Sensitive Path-enclosed Tree (CSPT), which dynamically includes necessary predicate-linked path information. One problem with both PT and CSPT is that they may still contain unnecessary information. The other problem is that a considerable number of useful context-sensitive
information useful for relation extraction is missing from PT/CSPT, even though CSPT includes some contextual information related to predicate-linked path. This paper proposes a new approach to dynamically determine the tree span appropriate for relation extraction. A variety of linguistics-driven operation rules are applied to prune out noisy information from the parse tree and include necessary contextual information. Then we unify the structured syntactic information with flat entity-related semantic information into Unified Dynamic Relation Trees, in which different kinds of entity-related semantic features, such as major type, subtype and mention type etc., are incorporated. The rest of this paper is organized as follows. Section 2 describes the generation of Dynamic Relation Tree while the Unified Dynamic Relation Tree is presented in Section 3. Section 4 reports the experimental results on the ACE RDC corpus. Finally, we conclude our work in Section 5.
2. Dynamic Relation Tree One of key problems for tree kernel-based relation extraction is the structured syntactic representation of relation instances. The Path-enclosed Tree (PT) [10] achieves the best performance according to their evaluation. Zhou et al. [13] propose Context-Sensitive Path-enclosed Tree (CSPT), which can dynamically determine the tree span by extending the necessary predicate-linked path information outside PT. However, there are still two problems related to PT/CSPT: (1) Both PT and CSPT may still contain noisy information. For example, in the sentence “families of seven other former hostages …”, the condensed information “families of hostages” is sufficient to determine the “PER-SOC” relationship between the entities “families”(PER) and “hostages”(PER), while PT/CSPT still keep the underlined part. (2) CSPT only captures some context-sensitive information related to predicate-linked path. For example, in the sentence “… bought one of town’s two meat-packing plants”, where actually there is no relationship between the entities “one”(FAC) and “town”(GPE) defined in the ACE RDC corpus. Nevertheless, the information contained in the PT/CSPT (“one of town”) may easily lead to their relationship being misclassified as “DISC”, which is beyond our expectation. Therefore the underlined part outside PT/CSPT should be recovered so as to differentiate it from positive instances. Based on the above observations that different relation types may have different kinds of structured
syntactic structures and useful contextual information, we carefully review a great number of relation instances in the ACE RDC 2004 corpus and find out many specific linguistic constructions, which may contain either redundant information or critical information inside/outside PT. Therefore, in order to make the parse tree more concise and precise for relation extraction, starting from PT, we further carry out three kinds of operations (i.e., remove, compression and expansion) to reshape PT as follows: (1) Remove operation: This is to remove unnecessary information from the PT/CSPT using two rules: a) Removing all the constituents (except the headword) of the 2nd entity (DEL_ENT2_PRE): The headword of a noun phrase for an entity plays a key role in relation extraction. For example, as illustrated in Figure 1(a), for the noun phrase “the families of seven other former hostages”, “families”(PER) and “hostages”(PER) are headwords for the mentions two entities respectively. Together with preposition “of”, they uniquely determine the relation type “PER-SOC.Family” between them. Therefore, all other constituents (except the headword) of the 2nd entity (i.e. the underlined part) can be removed from the parse tree safely. b) Removing adverb/preposition phrases along the path (DEL_PATH_ADVP/PP): Along the path connecting the two entities in the parse tree, adverb phrases normally imply the way in which two entities relate with each other while preposition phrases indicate specific relationships, such as location, time and way etc. Therefore, these phrases play a little role in relation extraction and can be removed safely. (2) Compress operation: This is to compress coordination conjuncts into a single one. a) Compressing noun phrase coordination conjunction (CMP_NP_CC_NP): Noun phrase coordination conjunctions occur frequently in the ACE RDC 2004 corpus, e.g. in the phrase “governors from connecticut, south dakota, and Montana”, a “EMP-ORG.EmployExecutive” relation exists between the two entities “governors”(PER) and “montana”(GPE). Figure 1(b) illustrates the two trees before and after compression. b) Compressing verb phrase coordination conjunction (CMP_VP_CC_VP): Similar to noun phrase coordination conjunctions, verb phrase coordination conjunctions can also be compressed into a single verb phrase. c) Compressing single in-and-out nodes (CMP_SINGLE_INOUT): Although Zhang et al. [10] shows that compression of noun phrase nodes with single in-and-out arcs does not help in relation extraction, we re-evaluate its impact in our experimental settings and further compress “X-->Y->Z” into “X-->Z”.
NP
PP NP
NP E1- PER
E2- PER
NNS
IN
f ami l i es
of
recovers the coreferential relationship to differentiate it from the positive instances. After a series of these operations, a PT will be eventually transformed into a Dynamic Relation Tree (DRT), which only contains critical information for relation extraction as it is expected.
NP PP
CD
JJ
NP E2- PER
NNS
NNS
IN
NNS
host ages
f ami l i es
of
host ages
JJ
seven ot her f or mer
NP E1- PER
( a) Removal of const i t uent s bef or e t he 2nd ent i t y
3. Unified Dynamic Relation Tree
NP PP
NP NP
NP
NP E1- PER NNS
E- GPE IN
PP
NP
NNP
E- GPE ,
gover nor s f r om connect i cut
NNP
NNP
NP
NP
E2- GPE
E1- PER
NNP
NNS
, CC
, sout h dakot a , and
mont ana
NP E2- GPE IN
NNP
gover nor s f r om mont ana
( b) Compr essi on of NP coor di nat i on NP
NP PP
PP NP
NP NP
NP
E1- PER
E2- GPE
NN
IN
one of
DT t he
NN
NP
NP E- FAC POS
t own ' s
CD
JJ
NN
t wo meat - packi ng pl ant s
E2- GPE
E1- FAC NN
IN
NN
POS
one
of
t own
's
( c) expansi on of possessi ve t ag r i ght af t er NP
Entity-related features impose strong constraints on relation types according to the ACE definition, e.g. the relation “PER-SOC” describes the relationship only between entities with type “PER”. Instead of constructing a composite kernel [10][13] to interpolating syntactic and semantic information, we incorporate various entity-related semantic features into the Dynamic Relation Tree as different nodes. Figure 2 shows three tree setups with only entity major type attached at different locations. This example is excerpted from the ACE RDC 2004 corpus, where a relation “EMP-ORG” exists between “president”(PER) and “mexico”(GPE). T1: DRT
NP
T2: UDRT- Bot t om NP PP
Figure 1. Examples of typical operation rules (the entity order and type information are attached with the entity type node, e.g., “E1PER”) (3) Expansion operation: the PT uniformly drops all the contextual information outside the path connecting the two entities, which may be useful in relation extraction. Therefore, some useful context-sensitive information may be recovered to boost the performance. a) Expanding the possessive structure after the 2nd entity (EXP_ENT2_POS): the possessive structure after the 2nd entity is usually important for relation extraction. For the example (the noun phrase “one of town's two meat-packing plants”) depicted in Figure 1(c), the possessive structure (i.e. “’s”) should be recovered when it follows right after the 2nd entity. b) Expanding entity coreferential mention before the 2nd entity (EXP_ENT2_COREF): Given the example appositive phrase “US destroyer Cole”, where a relation “ART.User-or-Owner” exists between the entities “US” (GPE) and “destroyer”(VEH). Although “destroyer” has coreferential relationship with “Cole” (VEH), there is no relation between “US” and “Cole” defined in the ACE corpus. While the previous rule “RMV_ENT2_PRE” removes “destroyer” from the appositive phrase “US destroyer Cole”, leading it to be misclassified as “ART.User-or-Owner” relationship between the entities “US” and “Cole”, this rule
PP
NP
NP
NP
NP
E2
E1
NN
IN
NNP
NN
TP
IN
NNP
TP
pr esi dent
of
mexi co
pr esi dent
PER
of
mexi co
GPE
TP1
TP2
PER
GPE
E1
T3: UDRT- Ent i t y
E2
T4: UDRT- Top
NP
PP
PP NP E1- PER
NP
NP
NP
NP
E2- GPE
E1
NN
IN
NNP
NN
IN
NNP
pr esi dent
of
mexi co
pr esi dent
of
mexi co
E2
Figure 2. Unified dynamic relation tree setups for a relation instance. (1) Dynamic Relation Tree (DRT, e.g. Fig. 2(T1)): there is no entity-related information except the entity order (i.e. “E1” and “E2”). (2) UDRT-BottomNode(e.g. Fig. 2(T2)): the DRT with entity-related information attached at the bottom of two entity nodes. (3) UDRT-EntityNode(e.g. Fig. 2(T3)): the DRT with entity-related information attached at entity nodes. (4) UDRT-TopNode(e.g. Fig. 2(T4)): the DRT with entity-related feature attached at the top node of the tree.
4. Experimentation 4.1. Experimental Setting For evaluation, we use the ACE RDC 2004 corpus as the benchmark data. The ACE RDC 2004 data contains 451 documents and 5702 relation instances. It defines 7 entity types, 7 major relation types and 23 subtypes. For comparison with previous work, evaluation is done on 347 (nwire/bnews) documents and 4307 relation instances using 5-fold crossvalidation. Here, the corpus is parsed using Charniak’s parser [2] and relation instances are generated by iterating over all pairs of entity mentions occurring in the same sentence with given “true” mentions and coreferential information (i.e. as annotated by LDC annotators). In this paper, we employ the same convolution tree kernel proposed in [3][4], which counts the number of common sub-trees as the structured similarity between two parse trees T1 and T2. This kernel has been previously applied in syntactic parsing [4], semantic role labeling [8] and relation extraction [10][13], and achieved promising success. In our experimentation, SVMLight [6] with the tree kernel function [8] is chosen as our classifier. For efficiency, we apply the one vs. others strategy, which builds k classifiers so as to separate one class from all others. Similar to Zhang et al. [10] and Zhou et al. [13], the training parameters C (SVM) and λ (decaying factor for tree kernel) here are also set to 2.4 and 0.4 respectively.
4.2. Experimental Results Table 1 evaluates the contribution of different operation rules to extraction performance on the 7 relation types of the ACE RDC 2004 corpus using the tree span with entity-type information as depicted in Figure 1. First, PT is used as the baseline tree setup, thereafter different operation rules are applied one by one to dynamically reshape the tree in two different modes: --[M1] Respective: every operation rule is individually applied on PT. --[M2] Accumulative: every operation rule is incrementally applied on the previously derived tree, which begins with PT and eventually gives rise to a Dynamic Relation Tree (DRT). When a specific rule exhibits no positive or even negative results, it will not be included in the next round and its performance would not be showed in accumulative mode. In contrast, the plus sign preceding a specific rule indicates that this rule is useful and will be added automatically in the next round.
Table 1. Contribution of various operation rules in respective mode (inside parentheses) and accumulative mode (outside parentheses) respectively. Operation rules P R F PT (baseline) +DEL_ENT2_PRE DEL_PATH_PP DEL_PATH_ADVP +CMP_SINGLE_INOUT +CMP_NP_CC_NP CMP_VP_CC_VP +EXP_ENT2_POS +EXP_ENT2_COREF
76.3
59.8
67.1
76.3
62.1
68.5
(76.4) (76.4) 76.4 (76.4) 76.1 (76.0) (76.4) 76.6 (76.9) 77.1 (76.4)
(59.6) (59.9) 63.1 (60.6) 63.3 (60.5) (59.9) 63.8 (60.6) 64.3 (59.9)
(67.0) (67.1) 69.1 (67.6) 69.1 (67.4) (67.1) 69.6 (67.7) 70.1 (67.1)
This table shows that the eventual Dynamic Relation Tree (DRT) achieves the best performance of 77.1%/64.3%/70.1 in P/R/F respectively after applying all the operation rules, with the increase of F-measure by 3.0 units compared to the baseline PT. This indicates that reshaping the tree by various linguisticsdriven operation rules significantly improves extraction accuracy largely due to the increase in recall. It also shows that: nd z Removing the constituents before the 2 entity is very effective and improves the F-measure by 1.4 units due to the increase in recall, this is due to the reason that most of the 2nd entities are noun phrases and usually contain many pre-modifiers. z Compressing single in-and-out nodes slightly improves the F-measure by 0.5/0.6 units in modes M1 and M2 respectively. This means that carefully compressing complex structures is helpful to a certain extent in relation extraction. nd z Expanding the possessive structure after the 2 entity is useful and improves the F-measure by 0.5/0.6 units in modes M1 and M2 respectively, due to the increase both in precision and recall. As the coreferential information is originally located before the 2nd entity in PT, it is not surprising that expanding this information improves the F-measure by 0.5 units in mode M2, while it is useless in mode M1. z The remaining three operations do not help or even
slightly decrease the extraction performance. This may be due to that these operations are mainly related to the “Verbal” structure (i.e. these relations are headed by a verb and are not included within a single noun phrase). While relation instances with “Verbal” structure are essentially difficult to be detected and classified, it is not surprising that these rules do not contribute to relation extraction. Table 2. Comparison of different unified dynamic relation tree setups Tree Setups P R F DRT 68.7 53.5 60.1 UDRT-BottomNode 76.2 64.4 69.8 UDRT-EntityNode 77.1 64.3 70.1 76.4 65.2 70.4 UDRT-TopNode Table 2 compares the performance of different unified dynamic tree setups on 7 relation types of the ACE RDC 2004 corpus, where only entity type information is attached at different locations except the DRT. It shows that: z Compared with DRT, the Unified Dynamic Relation Trees (UDRTs) with only entity type information significantly improve the F-measure by average 10 units due to the increase both in precision and recall. This suggests that the UDRTs can effectively capture both the structured syntactic parse information and the entity-related semantic features. z Among the three UDRTs, UDRT-TopNode achieves slightly better performance (0.6/0.3 units in Fmeasure respectively) than the other two. This may be due to the introduction of the decay factor λ (set to 0.4 here) in making the kernel less dependent on the tree size, which also decreases the effect of entity-related semantic information when they are attached at lower nodes. Therefore in the remaining experiments below, we will apply the UDRTTopNode setup by default. Table 3 further evaluates the contribution of various kinds of entity-related semantic features on the 7 major types of the ACE RDC 2004 corpus using the UDRTTopNode setup by adding them one by one in the decreasing order of their potential importance. Note that the plus sign preceding a feature means that this feature is useful and should be included in the next round. This table reports that our system achieves the best performance of 80.2%/69.2%/74.3 in P/R/F respectively. It also shows that: z Entity subtype and mention level information both mildly improve the F-measure by 1.4/1.8 units respectively. This indicates that gracefully defined
z
z
# 1 2 3 4 5 6 7 8 9
entity type/subtype and mention type information can contribute a great deal in the ACE 2004 corpus. It is a bit surprising to observe that the other four kinds of information, including entity class, GPE role, headword and LDC mention type, decrease the F-measure by 0.4/0.3/1.0/1.0 units respectively. Finally, the predicate verb (in basic form) nearest to the 2nd entity slightly improves the F-measure by 0.6. This suggests that predicate verb may help relation extraction when their basic forms are moved from the bottom to the top of the parse tree. Table 3. Contribution of various kinds of entity-related semantic features using the UDRT-TopNode setup Entity Info P R F entity order 68.7 53.5 60.1 +entity type 76.4 65.2 70.4 +entity subtype 78.2 66.3 71.8 +mention level 80.0 68.1 73.6 entity class 80.2 67.8 73.5 GPE role 79.8 67.7 73.3 head word 80.0 67.5 73.2 LDC type 80.0 67.7 73.3 +predicate base 80.2 69.2 74.3
Table 4. Improvements of different tree setups over PT on the ACE RDC 2004 corpus Tree setups P R F CSPT over PT 1 1.5 1.1 1.3 DRT over PT 0.1 5.4 3.3 UDRT-TopNode over PT 3.9 9.4 7.2 In Table 4 we summarize the improvements of different tree setups over PT. It shows that in a similar setting, DRT outperforms PT by 3.3 units in Fmeasure, while CSPT outperforms PT by 1.3 units in F-measure. This suggests that Dynamic Relation Tree (DRT) performs best among DRT/CSPT/PT setups. It also shows that the Unified Dynamic Relation Tree with entity-related semantic features attached at the top node of the tree performs significantly better than the other two tree setups (i.e., CSPT and DRT) by 5.9/3.9 units in F-measure respectively. This means that entityrelated semantic information is very useful and contributes much when they are incorporated into the parse tree. Finally, Table 5 compares our system with other state-of-the-art kernel-based systems on the 7 relation types of the ACE RDC 2004 corpus. It shows that our 1
We arrive at these values by subtracting F-measure of Path-enclosed Tree (79.6/5.6/71.9) from F-measure of Dynamic Context-Sensitive Path-enclosed Tree (81.1/6.7/73.2) according to Table 2 [13]
UDRT-TopNode performs best among tree setups using one single kernel, and even better than the two previous composite kernels [10][11]. Moreover, It is reasonable to speculate that if our UDRT kernel is made context-sensitive [13], or further combined with a linear state-of-the-art feature-based kernel into a composite kernel, we will greatly boost its performance. Table 5. Comparison of different systems on the ACE RDC 2004 Systems P R F Zhou et al. [13]: 82.2 70.2 75.8 Composite kernel Zhang et al. [10]: 76.1 68.4 72.1 Composite kernel Zhao and Grishman [11]: 69.2 70.5 70.4 Composite kernel Ours: 80.2 69.2 74.3 CTK with UDRT-TopNode Zhou et al. [13]: context81.1 66.7 73.2 sensitive CTK with CSPT Zhang et al. [10]: 74.1 62.4 67.7 CTK with PT
5. Conclusion Structured syntactic parse information holds great potential for relation extraction. This paper proposes a Unified Dynamic Relation Tree (DRT) by first applying some linguistics-driven rules to extract a DRT from a syntactic parse tree and then unifying various kinds of semantic information. This largely avoids interpolating syntactic and semantic information via a composite kernel. Evaluation on the ACE RDC 2004 corpus shows that integrating semantic information into the syntactic parse tree can significantly improve the performance. In particular, among various entityrelated semantic features, entity type, subtype, mention type and together with the base form of the predicate verb contribute most to relation extraction when they are attached at the top node of the parse tree. For the future work, we will focus on “soft” matching in computing the similarity between two parse trees, where semantic similarity between content words (such as “hire” and “employ”) would be considered to achieve better generalization.
5. Acknowledgement This research is supported by Project 60673041 under the National Natural Science Foundation of China, Project 2006AA01Z147 under the “863” National High-Tech Research and Development of
China, and the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No. 20060285008.
6. References [1] R.C. Bunescu and R.J. Mooney. A Shortest Path Dependency Kernel for Relation Extraction. EMNLP-2005. [2] E. Chianiak. Intermediate-head Parsing for Language Models. ACL-2001. [3] M. Collins and N. Duffy. Convolution Kernels for Natural Language. NIPS-2001. [4] M. Collins and N. Duffy. New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. ACL-2002. [5] A. Culotta and J. Sorensen. Dependency tree kernels for relation extraction. ACL-2004. [6] T. Joachims. Text Categorization with Support Vector Machine: learning with many relevant features. ECML-1998. [7] N. Kambhatla. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations. ACL-2004 (poster), pp. 178-181. [8] A. Moschitti. A Study on Convolution Kernels for Shallow Semantic Parsing. ACL-2004. [9] D. Zelenko, C. Aone, and A. Richardella. Kernel Methods for Relation Extraction. Journal of Machine Learning Research. 2003(2): 1083-1106. [10] M. Zhang, J. Zhang, J. Su, and G.D. Zhou. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. COLING-ACL’2006. [11] S.B. Zhao and R. Grishman. Extracting relations with integrated information using kernel methods. ACL-2005. [12] Zhou, G.D., J. Su, J. Zhang, and M. Zhang. Exploring various knowledge in relation extraction. ACL-2005. [13] G.D. Zhou, M. Zhang, D.H. Ji, and Q.M. Zhu. Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information. EMNLP-2007, pp. 728736.