Department of Computer Science and Automation,. Indian Institute of Science, Bangalore, India. {ramab,vsuresh,cevm}@csa.
Improving query focused summarization using look-ahead strategy Rama B, V. Suresh, C. E. Veni Madhavan Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India {ramab,vsuresh,cevm}@csa.iisc.ernet.in Abstract. Query focused summarization is the task of producing a compressed text of original set of documents based on a query. Documents can be viewed as graph with sentences as nodes and edges can be added based on sentence similarity. Graph based ranking algorithms which use ‘Biased random surfer model’ like topic-sensitive LexRank have been successfully applied to query focused summarization. In these algorithms, random walk will be biased towards the sentences which contain query relevant words. Specifically, it is assumed that random surfer knows the query relevance score of the sentence to where he jumps. However, neighbourhood information of the sentence to where he jumps is completely ignored. In this paper, we propose look-ahead version of topic-sensitive LexRank. We assume that random surfer not only knows the query relevance of the sentence to where he jumps but he can also look N-step ahead from that sentence to find query relevance scores of future set of sentences. Using this look ahead information, we figure out the sentences which are indirectly related to the query by looking at number of hops to reach a sentence which has query relevant words. Then we make the random walk biased towards even to the indirect query relevant sentences along with the sentences which have query relevant words. Experimental results show 20.2% increase in ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set. Further, our system outperforms best systems in DUC 2006 and results are comparable to state of the art systems. Keywords: Topic Sensitive LexRank, Look-ahead, Biased random walk
1
Introduction
Text summarization is the process of condensing a source text into a shorter version preserving its information content [1]. Generic multi-document summarization aims at producing summary from a set of documents which are based on the same topic. Query focused summarization is a particular kind of multidocument summarization where the task is to create a summary which can answer the information need expressed in the query [14]. After the introduction of query focused summarization as the main task in DUC1 competitions, it has become one of the important fields of research in both natural language processing and information retrieval. 1
http://duc.nist.gov
In this paper, we concentrate on sentence extractive summarization which basically involves heuristic ranking of sentences present in the documents, and picking top-ranked sentences to the summary. Query focused summarization is harder task than generic multi-document summarization because it expects the summary to be biased towards the query. Usually, queries are complex and not direct questions. So the the summary generated by just picking the textual units which contain names or numbers would not suffice. Therefore it requires deep understanding of the documents to create an ideal summary. Further, query will have very few words. So the main challenge is to use this little information to pick important sentences which answer the question in the query. Several methods have been developed for query focused summarization over the years. Graph based summarization methods based on Google’s PageRank algorithm [2] have gained much attention due to their simplicity, unsupervised nature and language independence. For example, Lexrank [9] builds a weighted undirected graph by considering sentences as nodes and edges are added between the sentences based on cosine similarity measure. Then PageRank algorithm is applied to find salient sentences in the graph. Otterbacher et al. [5] proposed the idea of biased random walk called topic-sensitive LexRank for question answering. However, topic-sensitive LexRank can be easily extended to query focused summarization. Wan et al. [11] came up with an improved method. In their algorithm, instead of treating all relations between the sentences equally, within-document relationships and cross-document relations are differentiated and separate random walk models are used. In the existing ‘biased surfer models’, sentences which contain query relevant words are given high scores by making the random walk biased towards these sentences. This is based on the assumption that random surfer knows the query relevance score of the sentence to where he jumps. However, the sentences which are indirectly related to the query are found during the course of the algorithm using the link structure of the similarity graph. In our model, we try to find out sentences which are indirectly related to the query using the neighbourhood information. Specifically, we assume that random surfer not only knows the query relevance score of the sentence to where he jumps but we also include the option of looking N-step ahead from that sentence to learn more about it. Now we bias the random walk towards both indirect query relevant sentences and the ones which contain query relevant words. This results in generating better quality summaries. The experiments on DUC 2006 and DUC 2007 data sets confirm that inclusion of look-ahead strategy yields better performance. We show that our method performs better than some of the recently developed methods and results are comparable to state of the art approaches.
The rest of the paper is organized as follows: In Section 2, we give a brief description of topic-sensitive LexRank and then we introduce our model. In Section 3, we present experiments and results. Finally, we conclude and suggest possible directions for future research in Section 4.
2
Topic-sensitive LexRank: A Revisit
Since we develop our model from topic-sensitive LexRank, we give a brief description of it in this section. Topic-sensitive LexRank uses concept of graph based centrality to rank the sentences. It consists of the following steps. A similarity graph G is constructed using the sentences in the document set. Each sentence is viewed as a node. Stopwords are removed from both query and the sentences. Next, all the words are reduced to their root form through stemming and word ISF’s (Inverse Sentence Frequency) are calculated by the following formula: isfw = log
Ns + 1 0.5 + sfw
(1)
where, Ns is the total number of sentences in the cluster, sfw is the number of sentences that the word w appears in. Now, relevance of a sentence s to the query q is computed by the following formula: rel(s|q) =
X
log(tfw,s + 1) × log(tfw,q + 1) × isfw
(2)
w∈q
where tfw,s and tfw,q are the number of times w appears in s and q, respectively. Similarity between two sentences is calculated using cosine measure weighted by word ISF’s. 2
P
tfw,x tfw,y (isfw ) qP 2 2 (tf isf ) × x ,x x i i xi ∈x yi ∈y (tfyi ,y isfyi )
sim(x, y) = qP
w∈x,y
(3)
The main aim of query focused summarization is to pick sentences which are relevant to the query. Therefore sentences which are similar to the query should get high score. But a sentence that is similar to other high scoring sentence in the graph should also get a high score. This is modelled by using a mixture model. Considering the entire graph, if p(s|q) denotes score of sentence s given query q, is determined as the sum of its relevance to query and the similarity to other sentences in the document cluster. p(s|q) = d P
X sim(v, s) rel(s|q) P + (1 − d) p(v|q) z∈C rel(z|q) z∈C sim(v, z) v∈C
(4)
d is known the as the damping factor which is a trade-off between the similarity of a sentence to the query and to the other sentences in the cluster. Equation 4 can be explained using random walk model as follows. If we imagine a random surfer jumping from one node to another on the graph, at each step, the random surfer does one of the two things - with probability d, he randomly jumps to a sentence (random jump) with a probability proportional to its relevance to the query; or with probability 1 − d he follows an outlink to reach one of the neighbouring nodes (forward jump) with a probability proportional to the edge weight in the graph. Since we want to give high score to sentences which are similar to the query, usually d > 1 − d in Equation 4. Experimentally it is proved that d = 0.7 gives best results [3]. Now we will introduce few key terms which help us to understand the topicsensitive LexRank better. Direct query relevant sentence: A sentence which has got at least one word overlapping with the query. Indirect query relevant sentence: A sentence which does not have any query related words but it is in the vicinity of ‘direct query relevant’ sentences in the graph. N-step indirect query relevant sentence: An ‘indirect query relevant’ sentence with at least one of the sentences which are N-hops away from the current sentence is ‘direct query relevant’ and further, none of the sentences which are at a distance less than N-hops are ‘direct query relevant’. We assume that the similarity graph contains at least one ‘direct query relevant’ sentence. Topic-sensitive LexRank uses biased random walk to boost the scores of both direct and ‘indirect query relevant’ sentences in a single equation. In case of random jump, ‘direct query relevant’ sentences are preferred as random surfer knows the query relevance of the sentence to where he jumps. Therefore random jump boosts the score of only ‘direct query relevant’ sentences. Scores of ‘indirect query relevant’ sentences will not get affected as they have zero similarity with the query. On the other hand, forward jump is used to increase the scores based on sentence similarity. Basically through forward jump, sentences which are adjacent to other high scoring sentences end up getting high score too. Therefore, it helps ‘indirect query relevant’ sentences as they are very near to other high scoring sentences. At the starting of the algorithm, set of ‘direct query relevant’ sentences is known. So the random surfer is purposely made to jump to those sentences in every random jump to increase their score. Whereas the set of ‘indirect query relevant’ sentences is not known. Therefore they depend on the forward jump of the random surfer from neighbouring high scored sentences to boost their score.
2.1
Topic-sensitive LexRank with look-ahead strategy
In our method we mainly concentrate on detecting ‘indirect query relevant’ sentences so that random surfer is made to choose both ‘direct query relevant’ and ‘indirect query relevant’ sentences during random jump. For the analysis purpose, lets concentrate on ‘1-step indirect query relevant’ sentences i.e. sentences which have at least one ‘direct query relevant’ sentence as their neighbour. If we know the set of ‘1-step indirect query relevant’ sentences before hand just like ‘direct query relevant’ sentences, then we can make the random surfer to jump to even ‘1-step indirect query relevant’ sentences during random jump. The advantage of this method is that, random jump which is happening in “every move of the random surfer”, will now prefer these sentences too. So ‘1-step indirect query relevant’ sentences need not have to wait for the forward jumps to boost their scores. ‘Direct query relevant’ sentences can be easily detected, as they will have non zero similarity with the query. But in order to detect ‘1-step indirect query relevant’ sentences, we have to make use of query relevance scores of the neighbours. Specifically, sum of query relevance scores of the neighbours will be non zero for ‘1-step indirect query relevant’ sentences. Therefore to make the random surfer to jump to both ‘direct query relevant’ and ‘1-step indirect query relevant’ sentences in every random jump, we will define the modified query relevance function as follows rel0 (si |q) = α × rel(si |q)
+
β×
X
rel(sj |q)
(5)
sj ∈N e(si ,1)
where, N e(s, k) returns sentences which are k-hop distant from s. α and β are the parameters used to control the probability of random surfer jumping to ‘direct query relevant’ sentences and ‘1-step indirect query relevant’ sentences respectively. In Equation 4, if we use the modified query relevance function defined in Equation 5, then resultant model can be viewed as topic-sensitive LexRank with “1-step look-ahead”. So now the random surfer not only knows the query relevance score of the sentence to where he jumps, but he also knows the query relevance score of its neighbours. Since ‘direct query relevant’ sentences are more important than ‘1-step indirect query relevant’ sentences, usually α > β. Note that in Equation 5, ‘direct query relevant’ sentences can get additional advantage from second term as their neighbouring sentences could be other ‘direct query relevant’ sentences. So α and β must be carefully chosen. Generalizing this, for ‘N-step indirect query relevant’ sentence, sum of query relevance scores of sentences which are N-hops away from the current sentence will be non zero. So if we use the look ahead information, we can judge a sentence better in the sense that we can detect whether it is 1-step or 2-step or in
general ‘N-step indirect query relevant’ sentence. Now with “N-level look ahead information”, we can make the random surfer to jump to both ‘direct query relevant’ sentences and all “K-step indirect query relevant sentences” in every random jump, where 1 β.
Fig. 1: ROUGE-2 vs. β.
Fig. 2: ROUGE-2 vs. α.
In order to test the effect of damping factor d in the ranking process, we repeated the experiment with d varying from 0 to 1 and keeping α = 0.56 and β = 0.08. From Fig 3, we can conclude that look-ahead version of topic-sensitive LexRank also achieves maximum performance at d = 0.7.
Fig. 3: ROUGE-2 vs. d.
Now with the setting λ = 0.2, d = 0.7, α = 0.56 and β = 0.08 we tested the performance of our model on DUC 2007 data set. We got ROUGE-2 score of 0.11983 and ROUGE-SU4 score of 0.17256. 3.3
Comparison with DUC systems
Tables 1 and 2 show the comparison of our model with top 5 performing systems in DUC 2006 and DUC 2007 respectively. Our model is denoted by “1-step T-LR” which stands for “Topic sensitive LexRank with 1-step look-ahead”. In both the tables, scores are arranged in decreasing order of ROUGE-2. Last row of each table indicates baseline summaries which were created automatically for each document set by taking all leading sentences from the most recent document until a word length of 250 was reached. Systems 1-step T-LR S24 S12 S23 S8 S28 Baseline
R-2 0.09535 0.09505 0.08987 0.08792 0.08707 0.08700 0.04947
R-SU4 0.15134 0.15464 0.14755 0.14486 0.14134 0.14522 0.09788
Table 1: Comparison with DUC 2006 top 5 systems
Systems S15 (IIIT-H) S29 (PYTHY) 1-step T-LR S4 S24 S13 Baseline
R-2 0.12448 0.12028 0.11983 0.11887 0.11793 0.11172 0.06039
R-SU4 0.17711 0.17074 0.17256 0.16999 0.17593 0.16446 0.10507
Table 2: Comparison with DUC 2007 top 5 systems
In DUC 2006, the proposed approach is able to outperform all the top performing systems and stands at first position in ROUGE-2 score. In DUC 2007, we can see that our method stands at third position in ROUGE-2 score. System 15 (IIIT-H) [8] and System 29 (PYTHY) [10] which were positioned at the top in overall ROUGE evaluations in DUC 2007 are state of the art systems. It should be noted that System 15 (IIIT-H) uses about a hundred of manually hand-crafted rules (which are language dependent) to reduce the sentences without losing much information. Even System 29 (PYTHY) uses certain sentence simplification heuristics. Though this technique increases the information content of the summary, this might affect the readability due to the fact that resulting sentences might be grammatically incorrect. Sentence simplification methods usually help in increasing ROUGE scores as we are removing unimportant words and hence making room for the informative ones. But this is done at the cost of generating ungrammatical sentences which are difficult to understand. To prove our point, IIIT-H system dropped to 22th position and PYTHY dropped to 21th position out of 30 submitted systems under “Grammaticality” in DUC 2007 evaluations. However, our method does not use any sentence simplification methods and hence summaries generated by our system do not suffer from grammaticality issues. Moreover, our method is able to produce informative summaries which are as good as the ones produced by state of the art systems which is evident from ROUGE scores.
3.4
Comparison with existing methods
In this section, we will compare the performance of our model with some of the recently developed systems. Description of the systems are as follows. Wiki [4]: Uses wikipedia as a source of knowledge to expand the query. Adasum [13]: Employs a mutual boosting process to generate extractive summaries and optimize topic representation. SVR [7]: Uses Support Vector Regression (SVR) to estimate the importance of sentences through set of pre-defined features. HT [6]: Builds a hierarchical tree representation of the words present in the document set. A bottom-up algorithm is to used find significance of words and then sentences are picked using a top down algorithm applied on the tree. T-LR [5]: Topic-sensitive LexRank. In Table 3 we can see that our system performs better than all the recently published methods. Further, our model shows 20.17% improvement in ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set. Fig 3 shows the per-topic comparison of Topic-sensitive LexRank (T-LR) with our model (1-step T-LR) in ROUGE-2 and ROUGE-4 scores on DUC 2007 data set. We can see that our method has performed well in almost all the document sets.
Systems 1-step T-LR Adasum SVR HT Wiki T-LR
R-2 0.11983 0.11720 0.11330 0.11100 0.11048 0.09972
R-SU4 1-step T-LR Improvement 0.17256 0.16920 (2.24%, 1.99%) 0.16520 (5.76%, 4.46%) 0.16080 (7.96%, 7.31%) 0.16479 (8.46%, 4.72%) 0.15300 (20.17%, 12.78%)
Table 3: Comparison with existing methods on DUC 2007 data set
Fig. 4: Per-topic comparison of Topic-sensitive LexRank (T-LR) with our system (1step T-LR)
4
Conclusion and Future Work
In this paper, we present look ahead version of the topic-sensitive LexRank. Essentially we use look ahead strategy to find ‘indirect query relevant’ sentences and then we bias the random walk towards both ‘direct query relevant’ and ‘indirect query relevant’ sentences. Experimental results on DUC 2006 and DUC 2007 data sets confirms the idea of the proposed work and shows that performance of our model is comparable to state of the art approaches. Further, our model preserves linguistic quality of the generated summary unlike state of the art approaches. Our method does not depend on any language specific features and achieves good results without taking help of any external resources like WordNet/Wikipedia. We do not include any pre-processing steps like POS tagging or parsing of sentences which may consume time. In future, we plan to extend our model to generic multi-document summarization. In query focused summarization we exactly know that sentences which are biased towards the query are potential candidates for the summary. But in generic multi-document summarization, we have to exploit the natural topic distribution in the documents to find out important sentences. So the main challenge is to figure out how to incorporate look-ahead strategy in this framework.
5
Acknowledgement
We acknowledge a partial support for the work, from a project approved by the Department of Science and Technology, Government of India.
References 1. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization. pp. 10–17 (1997) 2. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107–117 (1998) 3. Erkan, G.: Using biased random walks for focused summarization. In: Proceedings of the DUC 2006 document understanding workshop. Brooklyn, NY, USA (2006) 4. Nastase, V.: Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In: EMNLP ’08: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 763–772. Association for Computational Linguistics, Morristown, NJ, USA (2008) 5. Otterbacher, J., Erkan, G., Radev, D.R.: Using random walks for question-focused sentence retrieval. In: HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. pp. 915–922. Association for Computational Linguistics, Morristown, NJ, USA (2005) 6. Ouyang, Y., Li, W., Lu, Q.: An integrated multi-document summarization approach based on word hierarchical representation. In: ACL-IJCNLP ’09: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. pp. 113–116. Association for Computational Linguistics, Morristown, NJ, USA (2009) 7. Ouyang, Y., Li, W., Li, S., Lu, Q.: Applying regression models to query-focused multi-document summarization. Inf. Process. Manage. (2010) 8. Pingali, P., K, R., Varma, V.: Iiit hyderabad at duc 2007. In: Proceedings of the Document Understanding Conference. Rochester, NIST (2007) 9. Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22, 2004 (2004) 10. Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H., Vanderwende, L.: The pythy summarization system: Microsoft research at duc2007. In: DUC 2007: Document Understanding Conference. Rochester, NY, USA (2007) 11. Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: WI ’06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. pp. 1012–1018. IEEE Computer Society, Washington, DC, USA (2006) 12. Wan, X., Yang, J., Xiao, J.: Manifold-ranking based topic-focused multi-document summarization. In: IJCAI’07: Proceedings of the 20th international joint conference on Artifical intelligence. pp. 2903–2908. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2007) 13. Zhang, J., Cheng, X., Wu, G., Xu, H.: Adasum: an adaptive model for summarization. In: CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge management. pp. 901–910. ACM, New York, NY, USA (2008) 14. Zhao, L., Wu, L., Huang, X.: Using query expansion in graph-based approach for query-focused multi-document summarization. Inf. Process. Manage. 45(1), 35–41 (2009)