Exploiting the Role of Named Entities in Query ... - Semantic Scholar

Exploiting the Role of Named Entities in Query-Oriented Document Summarization Wenjie Li1, Furu Wei1,2, Ouyang You1, Qin Lu1, and Yanxiang He2 1 Department of Computing The Hong Kong Polytechnic University, Hong Kong {cswjli,csyouyang,csluqin}@comp.polyu.edu.hk 2 Department of Computer Science and Technology Wuhan University, China {frwei,yxhe}@whu.edu.cn

Abstract. In this paper, we exploit the role of named entities in measuring document/query sentence relevance in query-oriented extractive summarization. Named entity driven associations are defined as the informative, semanticsensitive text bi-grams consisting of at least one named entity or the semantic class of a named entity. They are extracted automatically according to seven pre-defined templates. Question types are also taken into consideration if they are available when dealing with query questions. To alleviate problems with low coverage, named entity based association and uni-gram models are integrated together to compensate each other in similarity calculation. Automatic ROUGE evaluations indicate that the proposed idea can produce a very good system that among the best-performing system at the DUC 2005. Keywords: Query-Oriented Summarization, Named Entity based Association.

1 Introduction In recent years, the focus has been noticeably shifted from generic summarization to query-oriented summarization, which aims to produce a summary from a set of relevant documents with respect to a given query, i.e. a short description of the user’s information need containing one or more narrative and/or question sentences. As anticipated, the machine generated summaries should concisely describe information contained in the documents and also should facilitate the user in understanding documents according to his/her interests. The advantages of query-oriented summarization in information retrieval have been widely acknowledged. Brief summaries allow people to judge the relevance of the returned results without having to look through the whole documents. Currently, most query-oriented summarization approaches are to extract the salient sentences from the documents which are supposed to be relevant to the given query. The fundamental issue with these approaches is how to measure the relevance of document sentences to the query sentences. In earlier studies, sentences are represented as bags of words. There are at least two drawbacks with this representation. First, the single word (i.e. word Uni-gram) is not informative enough to represent T.-B. Ho and Z.-H. Zhou (Eds.): PRICAI 2008, LNAI 5351, pp. 740–749, 2008. © Springer-Verlag Berlin Heidelberg 2008

Exploiting the Role of Named Entities in Query-Oriented Document Summarization

741

underlying information in the sentences. For example, the meaning of the residence of US president would be lost when “White House” was represented by “White” and “House” separately. As a result, named entities, like other words, should be treated as meaningful text units when measuring relevance. Second, the ordering information, especially the semantic underlying information and the sentence structure can not be captured by Uni-gram models. N-gram, such as Bi-gram, model provides a mean to take into account of the shallow structural information by combining two text units. But meanwhile, any N-gram model will more or less suffer from the bottleneck of low coverage. That is why Uni-gram and Bi-gram models are normally combined in use, or constraints on Bi-gram models are relaxed. In this study, we tend to highlight the role of named entities (NE) in variety of NE driven models. Named entities are regarded as text uni-grams and NE centered associations are defined as the informative and semantic-sensitive text bi-grams involving at least one named entity in representing sentences. Associations combine named entities, their semantic classes, as well as other representative words (adjacent to the named entities in certain models). Question types, which indicate what kind of information the questions are looking for, if applicable, are also concerned in associations when dealing with the questions in query. Because of this, NE-driven models can help effectively locate the sentences that contain the most relevant information to the questions. Consequently better summaries could be expected. Automatic ROUGE evaluations show that the summaries produced by the combinatorial models of NE/word uni-grams and NE-driven bi-grams are comparatively good with the summaries produced by the best systems competing at the DUC 2005.

2 Related Work Query-oriented summarization has been boosted by DUC evaluations since 2005. Many previous approaches rank the sentences according to their relevance to the query and then select the most relevant ones into the summary. Regardless of the approaches taken, query-oriented summarization involves three basic aspects: text content representation; query formulation; and relevance judgment. Among them, how to estimate the relevance between query and sentences is the most fundamental issue, which has been extensively studied in the past. The simplest yet effective way is to calculate the cosine similarity of the two sentences represented by the vectors of the words [7, 13, 14]. Some related work also utilizes WordNet as the external resource to solve the word mismatch problem by calculating the semantic similarity between the words. An extension to vector space models is dimension reduction performed with latent semantic analysis [5]. In addition to various kinds of word occurrence, frequency and semantic matching techniques, similarity can be also measured by the matching of the other text contents, such as named entities [8, 14], basic elements [6], and grammatical relations [3]. Normally, the relevance is judged based on a set of features, which are linearly combined to decide how a sentence is likely to be included in the summary. An alternative way is to construct a single but complicated feature, such as dependency tree [12] or document graph [4, 11]. It is however limited by the complexity of feature construction and relevance judgment.

742

W. Li et al.

Question answering (QA) is closely related to query-oriented summarization in terms of needs for question interpretation. Although question type identification [2, 8], question reformulation [3, 12] and question expansion [1] have been applied in the context of query-oriented summarization, special handling of query questions is not well concerned in many related work.

3 Measuring Relevance with Named Entity Driven Association Models 3.1 NE Driven Bi-gram Association (NeBiA) Model In the NeBiA model, content associations are defined as the bi-grams involving at least one named entity or the semantic class of a named entity. They are the combinations of the named entities and the content representative words (i.e. non-stopwords) immediately adjacent to the named entities. All the associations fall into 4 categories and appear in one of the following forms: Table 1. Templates for the Extraction of NE Driven Bi-grams

Category NE-NE

NE-WORD NE-TAG TAG-WORD

Form

(NE1, NE2 ) (NE, word ) , (word , NE ) (NE1 , NE2 tag ) , (NE1 tag , NE2 ) (NE tag , word ) , (word , NE tag )

Table 1 provides seven templates to guide the automatic extraction of NE centered bi-grams from both document sentences and query sentences so that the similarity can be calculated according to the bi-grams they match and the matching extent. Notice that NE-NE represents two successive named entities in a sentence. But they are not necessarily adjoining to each other and might be separated by a couple of words inbetween. In fact, the NeBiA bi-grams defined in Table 1 are the selected subset of the text bi-grams, where the role of named entities is highlighted. It is common for the same entity to be expressed in the different ways when it is mentioned in the text. For example, “US”, “U.S.”, “the US” and “the United States” all refer to the Unite States. Consequently, most of time, named entities fail to find their matches simply because of this. Coreference resolution can definitely provide a solution to this problem, but itself is also a problem being worked out in natural language processing. Our solution is to relax the matching restriction to allow for both named entities and their semantic classes being included in the bi-gram associations. The semantic classes considered in this paper include , , , , and , which are called NE tags. Another advantage from the use of the NE tags is being able to integrate QA techniques into queryoriented summarization. This will be detailed in Section 3.3.


743

The NeBiA model can be extended to the NeBiA-II model to include all the words within the window of the sentence instead of the adjacent ones, i.e. extended from rigid to soft NE(TAG)-WORD bi-gram combinations. 3.2 NE Driven Event Bi-gram Association (NEvBiA) Model

As we know, named entities always play an important role in characterizing the events which can be defined as “[Who] did [What] to [Whom] [When] and [Where]”. The design of the NEvBiA model is based on the assumptions that if the words in the NeBiA model could be restricted to those related to the events, the bi-grams might be able to reflect the underling intra-event associations. In this paper, we choose verbs (such as “elect”) and action nouns (such as “supervision”) as event words that can characterize or partially characterize the actions or the incident occurrences in the world. They roughly relate to the “did [What]” mentioned above. Meanwhile, the named entities , , and convey the information of [Who], [Whom], [Where] and [When], while [Number] complements other event descriptions, such as the extent. Clearly, the NEvBiA bi-grams are the selected subset of the NeBiA bi-grams, where the words are limited to the event words. Similarly, the NEvBiA model can be extended to the NEvBiA-II model, corresponding to the NeBiA and NeBiA-II models. 3.3 Handling Query Questions

We strongly support the idea of incorporating QA techniques into query-oriented summarization. Thus, the models introduced in Section 3.1 and 3.2 are also designed to facilitate the formulation of both narrative and question sentences in query. For a query question, its question type is concerned and handled in the same way as the tags of the named entities presented in the sentence. Question type indicates what kind of information the question is looking for. It can help locate the sentences containing the information related to a particular question, and select the appropriate sentences in the summary. For example, if a sentence contains the named entity tagged as or , it should be more likely to provide the answer to the question “Who has criticized the World Bank?”. Figure 1 in the next page illustrates four categories of five NE driven bi-grams extracted from this question. Notice that the ordering information is reserved in them, i.e. (NE , word ) ≠ (word , NE ) . This can avoid the mistakes in including the sentences containing the phrase “World Bank criticized ” in the summary responding to the previous question. These sentences are obviously not expected. Question types are determined by a set of heuristic rules. For the questions beginning with the interrogatives like “who”, “where”, and “when”, a straightforward mapping between these interrogatives and the classes of the named entity to be questioned is established. “who”⇔, “where”⇔, and “when”⇔ . If the sentence begins with “which”, “what” or the word “name”, the classes are deduced based on the semantics of the nouns in the patterns of “which + noun”, “what + noun”, “what be + noun”, and “name + noun”. WordNet supplies the semantic information needed. See (Li, 2005) for more details.

744

W. Li et al.

Who

has [criticized] the World Bank? [Criticize]

Person

[World Bank]

Word

Organization

(, World Bank) (, Criticize) (Criticize, ) (Criticize, World Bank)

NE-TAG TAG-WORD TAG-WORD NE-WORD

Fig. 1. Example of the 3 Categories of Bi-gram association

3.4 Matching-Based Relevance Measure

Sentence and query relevance are measured based on the words and the associations they match. In this study, we make an attempt on three matching strategies: (1) exact matching (EM); (2) semantic matching (SM); and degreed matching (DM). EM and SM are binary decisions. While EM returns binary 0 or 1 depending on whether a matching succeeds or fails, SM considers the hyponyms of the words and returns 1 when the two words (or the two words in the two associations under comparison) belong to the same synset in WordNet. This is motivated by the observation that some words of the same or quite similar meanings are in different surface forms. These words are commonly synonyms or hyponyms, such as “diminish” and “reduction”. The third strategy, i.e. DM, backs off EM with SM. It performs EM first. Only when EM fails, it gets back to SM, and returns a value smaller than 1 (e.g. 0.7) if SM succeeds. The relevance is then be measured by calculating the similarity of the sentences and the query according to the frequencies of the matches. The matching strategies are applied not only to bi-gram association matching but also to uni-gram matching. For the extracted bi-gram associations, once the matching is done, they naturally form a collection of n association groups, denoted by A . An association group contains either a set of associations matched or a single association if no match is found. T The similarity of a sentence s D in a document set D and a query T = {s1T , s T2 ,..., s m } is then calculated by cosine similarity based on frequencies of ai

∑ tf (ai , s D )* tf (ai ,T ) n

(

)

Simbi s D , T =

i =1

∑ i =1 (tf (ai , s D )) n

2

*

∑ i =1(tf (ai ,T ))2 n

where ai ∈ A , tf( ai ,*) is the frequency of ai in s D or T.

tf (ai ,T ) =

∑ (tf (ai , sTj ) m

j =1


745

Associations provide important means for relational content matching. But it often suffers from low coverage. If the similarity is calculated solely based on association matching, an actual relevant sentence might be mistakenly judged as non-relevance. To remedy this shortage, the overall similarity is actually calculated by linearly combining the association model and the uni-gram model.

(

)

(

)

(

Sim s D , T = λ1Simuni s D , T + λ2 Simbi s D , T where

)

∑ tf (ui , s D )* tf (ui , T ) n

(

)

Sim uni s D , T =

i =1

∑

n i =1

(tf (u , s )) i

D 2

*

∑ (tf (ui ,T )) n i =1

. tf( u i ,*) denotes the fre-

2

quency of ui in s D and T . λ1 and λ2 are the weights for uni-gram and association based similarity respectively . Similarly,

( )

tf ui , T =

∑ (tf (ui , sTj )) m

j =1

4 Experimental Studies 4.1 Experiment Set-Up

The experiments are conducted on the DUC 2005 50 document sets. Each set of documents is given a query which simulates the user’s information need. All documents and queries are pre-processed by TextPrepEngine, a text pre-processing engine developed upon GATE1 and Porter Stemmer2. Sentences can then be represented by a group of words which are stemmed, part of speech (POS) tagged, and the stop-word removed3. Moreover, named entities are tagged for each sentence. According to the task definitions, system generated summaries are strictly limited to the 250 English words in length. Based on the calculated similarities, we pick up the highest scored sentences from the original documents into the summary until the word limitation is reached. Duplicate sentences are prohibited. Automatic evaluation methods and criteria are still a research topic in summarization community. Many literatures have addressed different methods for automatic evaluation other than human judges. Among them, the ROUGE toolkit4 [10], though being argued by quite a few researchers, is supposed to produce the most reliable and stable scores comparing with human evaluation. Moreover, the DUC 2005 officially adopts ROUGE as the automatic evaluation method. Therefore, we also take it as the evaluation means in this work. Specifically, the machine-generated summaries are evaluated in terms of average recalls of ROUGE-1, ROUGE-2, and ROUGE-SU4. 1

http://www.gate.ac.uk http://www.tartarus.org/~martin/PorterStemmer 3 A list of 199 words is used to the filter stop words. 4 ROUGE 1.5.5 is used, and the ROUGE parameters are “-n 2 -x -m -2 4 -u -c 95 -r 1000 -f A p 0.5 -t 0”, according to the DUC task definition. 2

746

W. Li et al.

4.2 Evaluation of Uni-gram Models

The word-based uni-gram model is implemented as the baseline model. When the named entities in the text are recognized and manipulated as the integrated text units, we call it NE-based models. Table 2 below compares the NE-based uni-gram model with the word-based uni-gram model. However, the advantage is visible but not markedly. In later experiments, we will further evaluate the combinations of uni-gram and various bi-gram models. Table 2. Evaluations of Uni-gram Models

Word-based NE-based

ROUGE-1 0.35952 0.36400

ROUGE-2 0.06932 0.06988

ROUGE-SU4 0.12602 0.12743

4.3 Evaluation of Uni-gram and Bi-gram Combinatorial Models

The aims of the following experiments are to examine the performance of various combinational models integrated with NE-based uni-gram and the bi-gram models and more important to discovery the most informative and representative text units. In the rigid approaches, the NeBiA and NEvBiA models have been described in Section 3. Bi-gram in the NeBi model is constrained to two adjoining words (non stop-words) according to their appearance in the sentence. This is the normal use of the bi-gram model. Differently, in the soft approaches, the two words within a given window size will be combined as a bi-gram in NeBi5-II, while in NeBiA-II and NEvBiA-II models, the NE-NE and NE-TAG bi-gram associations are constrained to two successive named entities (or tags), and a named entity (or tag) together with a word within the given window size 6 will be combined to the NE-WORD or TAG-WORD associations. The numbers behind the soft models in the following table denote the best window sizes used in our experiments (tuned experimentally). In this set of experiments, the SM strategy is adopted and λ1 : λ2 = 2 : 1 is set experimentally. Table 3. Results of Combinatorial Models

Rigid + NeBi + NeBiA + NEvBiA Soft + NeBi-II (7) + NeBiA-II (6) + NEvBiA-II (8) 5 6

ROUGE-1 0.36169 0.36563 0.36588 ROUGE-1 0.36201 0.36663 0.36670

ROUGE-2 0.07010 0.07345 0.07336 ROUGE-2 0.07068 0.07354 0.07357

ROUGE-SU4 0.12620 0.12974 0.12986 ROUGE-SU4 0.12648 0.12987 0.13014

NeBi here denotes the original NE based bi-gram model. Notice that the word within the given widow size to a named entity (or tag) can not cross another named entity (or tag).


747

Table 3 above presents the ROUGE results of the combinational models. We can see that the NeBi model improves the performance of the original word-based unigram slightly. And when text representative units are narrowed down gradually in NeBiA and NEvBiA models, the improvement becomes visible. Furthermore, more significant performance can be achieved when the soft models are taken into account., The best performance is obtained by the NEvBiA-II model. These results strongly support the ideas of using NE-driven bi-gram associations in query-oriented summarization. 4.4 Coverage Problems with Single-Handed Bi-gram Models

As mentioned previously, the single-handed bi-gram based approaches, i.e. NeBi, NeBiA and NEvBiA models suffer from the coverage problem. For some set of documents in our experiments, we can even hardly find enough sentences, which can be considered relevant to the query, in order to produce a summary with the length close to the given 250 words limitations by solely using the bi-gram based measuring methods. The proportion of “x/50” in the each row of table 4 denotes that “x” out of the total 50 document sets are capable of producing the 250 word length summary. Table 4. Results of Word based Models

NeBi (34/50) NeBiA (7/50) NEvBiA (7/50)

ROUGE-1 0.35272 0.37521 0.37711

ROUGE-2 0.06430 0.07947 0.07973

ROUGE-SU4 0.12061 0.13268 0.13163

Obviously, the results in table 4 indicates the NeBiA and NEvBiA models can achieve quite encouraging and significant performance, but they are both limited by the low coverage. That’s why normally bi-gram and uni-gram models are combined in use. It can be also observed that the performance of the bi-gram model NeBi is even worse than its corresponding uni-gram model. The sparse nature is the possible reason. This motivates us to restrict the bi-gram combinations in the proposed model. 4.5 Evaluations on Impacts of Matching Techniques

The following set of experiments aims to examine the three matching strategies, i.e. exact matching (EM), semantic matching and degreed matching (DM). WordNet 2.07 and JWNL8 are used to determine whether the two words are semantically matched according to whether they are in the same sysnet. In our implementation, DM will return 0.7 when the matching of two associations fails in EM but successes in SM. Table 5 shows the comparison results of the best-performing models, i.e. NEvBiA-II with 8 as the window size, in our former experiments.

7 8

http://wordnet.princeton.edu/ http://sourceforge.net/projects/jwordnet

748

W. Li et al. Table 5. Comparison of EM/SM/DM strategies

ROUGE-1 0.36604 0.36670 0.36704

EM SM DM

ROUGE-2 0.07340 0.07357 0.07360

ROUGE-SU4 0.13009 0.13014 0.13029

As seen, there exists improvement when semantic relation between two words is considered. However, the improvement is not quite obvious. This may due to the fact that the number of the named entity centered bi-gram associations involving with one word is still small in our current system, so that the contribution of the semantic relation is limited. 4.6 Comparison with DUC 2005 Top 3 Systems

The following table shows the comparison of our models with the DUC 2005 participating systems, where S15, S17 and S10 are the top three performing systems. As seen, both the NEvBiA-II and NeBiA-II models can achieve very competing performance. Although no further post-processing is carried out, the results of the NEvBiA-II model outperform the top system in the DUC 2005 in the ROUGE-2 evaluation, rank the second in ROUGE-SU4 evaluation and among the top three systems in the ROUGE-1 evaluation. Table 6. Comparison with DUC 2005 top-3 systems

NEvBiA-II NeBiA-II S15 S17 S10 NIST Baseline

ROUGE-1 0.36704 0.36663 0.37383 0.36901 0.36640 0.30217

ROUGE-2 0.07360 0.07354 0.07251 0.07174 0.07089 0.04947

ROUGE-SU4 0.13029 0.12987 0.13163 0.12972 0.12649 0.09788

6 Conclusion In this paper, the role of named entity has been emphasized in query-oriented summarization. The effects of named entities in uni-gram and bi-gram models are investigated. ROUGE evaluation based on the DUC 05 data set shows that the proposed models can achieve very competitive and significant performance. The NE based unigram and NE driven bi-gram combinatorial model can even outperform the best system in the DUC 2005. However, we also note that the use of name entities centered bi-gram associations is limited by the coverage problem, which can be improved by a more appropriate and wide-coverage named entity recognizer. Furthermore, since named entity co-reference is very useful in our investigation, co-reference resolution achievement in the natural language processing community will be further studied in the future work.


749

Acknowledgments The work described in this paper was supported by the grants from the RGC of HK, (Project No. PolyU5211/05E and PolyU5217/07E), the grant from the NSF of China (Project No. 60703008), and the internal grant from the Hong Kong Polytechnic University (Project No. A-PA6L).

References 1. Barzilay, R., Lapata, M.: Modeling Local Coherence: An Entity-based Approach. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 141–148 (2005) 2. Conroy, J.M., Schlesinger, J.D.: CLASSY Query-Based Multi-Document Summarization. In: Proceedings of Document Understanding Conferences 2005 (2005) 3. Doran, W., Newman, E., Stokes, N., Dunnion, J., Carthy, J.: IIRG-UCD at DUC 2005. In: Proceedings of Document Understanding Conferences 2005 (2005) 4. Erakn, G.: Using Biased Random Walks for Focused Summarization. In: Proceedings of Document Understanding Conferences 2006 (2006) 5. Hachey, B., Murray, G., Reitter, D.: The Embra System at DUC 2005: Query-oriented Multi-document Summarization with a Very Large Latent Semantic Space. In: Proceedings of Document Understanding Conferences 2005 (2005) 6. Hovy, E., Lin, C.Y., Zhou, L.: A BE-based Multi-document Summarizer with Query Interpretation. In: Proceedings of Document Understanding Conferences 2005 (2005) 7. Jagarlamudi, J., Pingali, P., Varma, V.: Query Independent Sentence Scoring approach to DUC 2006. In: Proceedings of Document Understanding Conferences 2006 (2006) 8. Li, W., Li, W., Li, B., Chen, Q., Wu, M.: The Hong Kong Polytechnic University at DUC2005. In: Proceedings of Document Understanding Conferences 2005 (2005) 9. Li, W., Li, B., Wu, M.: Query Focus Guided Sentence Selection Strategy for DUC 2006. In: Proceedings of Document Understanding Conferences 2006 (2006) 10. Lin, C.Y., Hovy, E.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL, pp. 71–78 (2003) 11. Mohamed, A.A., Rajasekaran, S.: Query-Based Summarization Based on Document Graphs. In: Proceedings of Document Understanding Conferences 2006 (2006) 12. Schilder, F., McCulloh, A., McInnes, B.T., Zhou, A.: TLR at DUC: Tree similarity. In: Proceedings of Document Understanding Conferences 2005 (2005) 13. Seki, Y., Eguchi, K., Kando, N., Aono, M.: Multi-Document Summarization with Subjectivity Analysis at DUC 2005. In: Proceedings of Document Understanding Conferences 2005 (2005) 14. Zhao, L., Huang, X., Wu, L.: Fudan University at DUC 2005. In: Proceedings of Document Understanding Conference 2005 (2005)

Exploiting the Role of Named Entities in Query ... - Semantic Scholar

Exploiting the Role of Named Entities in Query ... - Semantic Scholar

Suggest Documents

Handling Conjunctions in Named Entities - Semantic Scholar

Semantic Stream Query Optimization Exploiting ... - Semantic Scholar

Searching Named Entities in a Finnish OCRed ... - Semantic Scholar

Recognizing Named Entities in Tweets

Query-based Topic Detection Using Concepts and Named Entities

Robust Disambiguation of Named Entities in Text

Exploitation of Named Entities in Automatic Text

Categorizing entities by common role - Semantic Scholar

Semantic query suggestion using Twitter Entities

Ranking Entities Using Web Search Query Logs - Semantic Scholar

recognizing named entities in turkish tweets - AIRCC

Recognizing Named Entities in Tweets - Association for

Recognizing Named Entities in Tweets - Association for

The Effect of Named Entities on Effectiveness in Cross-Language ...

Towards a semantic extraction of named entities - CiteSeerX

exploiting query click logs for utterance domain ... - Semantic Scholar

Discovering Overlapping Communities of Named Entities

The role of named entities in Web People Search - Association for ...

An Analysis of the Performances of the CasEN Named Entities ...

Named Entities in the Digital Humanities - Google Docs

Identifying Named Entities in Text Databases from the Natural ... - ILK

Identifying Named Entities in Text Databases from the Natural ... - ILK

Sepia: Semantic Parsing for Named Entities by ...

Download PDF Named Entities - Google Sites