Dec 11, 2014 - Keywords: Text Summarization, Arabic documents, Graph-based approaches, ... Text summarization can be categorized into many types. ... distinction between the mono-lingual, where the ... summarization systems; the first one is Arabic Query- .... Stemming is the process of reducing words to their roots.
THE INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT2014)
An Extractive Graph-based Arabic Text Summarization Approach Ahmad T. Al-Taani and Maha M. Al-Omour Department of Computer Science, Yarmouk University, Jordan Abstract: In this study, we proposed an Arabic text summarization approach based on extractive graph-based approaches. In order to measure the efficiency of the proposed approach, several basic units such as stem, word, and n-gram are applied in the summarization process. The Arabic document is represented as a graph and we used the shortest path algorithm to extract the summary. Similarity between any two sentences is determined by ranking the sentences according to some statistical features. The final score is determined for each sentence using PageRank scoring, and finally, the sentences with high scores are included in the summary considering the compression ratio. The proposed approach is evaluated using EASC corpus, and intrinsic evaluation method. The results showed that the proposed approach achieved good results. Also, the results showed that the use of n-grams as a basic unit in summarization process achieved better results than the stem and word. Keywords: Text Summarization, Arabic documents, Graph-based approaches, Statistical-based approaches, n-gram, Stem.
1. Introduction The growing amount of information stored on the Internet has created a difficulty in finding the required information easily. Thus, intelligent applications like Automated Text Summarization (ATS) can be developed to access this information. Specialized Arabic ATS approaches are needed to overcome this problem and to support Arabic Natural Language Processing (NLP) systems. In general, ATS systems are very significant in many fields. This significant could appear in high-quality text summarization which would improve document searching and browsing in a variety of contexts [16]. ATS is an important field of study in NLP, which has recently experienced a great development, and a wide variety of approaches has been proposed to tackle this field. Furthermore, certain systems of ATS generate a shorter version of the source text that contains most of the relevant information [14], which can help users to find the required information they are looking for; this saves time and resources [20]. Text summarization can be categorized into many types. According to input factors, we can generate singledocument summary from a single input text document, or Multi-document summary that is generated from multiple input documents concerning the same topic [20]. With regard to the output of the text summarization, a summarization could be extractive, that selects important sentences from the original document and concatenates them into a shorter form; or, it could be Abstractive, which attempts to understand the main concepts in the document and to express them in a clear language through representing them using new terms, expressing them using new formulations, and mentioning other concepts and words than those in the original text. As a result, this type of text summarization is difficult to be virtually achieved since it uses the internal semantic representation and natural language generation techniques to create a summary [23]. Researches concerning this type are very limited in comparison with the extractive, on which most of the 1950’s researches focus [15]. Also, according to the output form of the summary, the summarization could be indicative, where a brief idea of
University of Nizwa, Oman
the original text is given, or it could be informative, where more detailed information is given [20]. Regarding to purpose and the output factors together, a summary could be generated using generic summarization, which gives relevant facts of a source text, or query-focused summarization, where the content of the summary is driven by a user need [21]. On the other hand, the language of the summary plays a key role in deciding what the suitable technique to generate the summary is, depending on the characteristics of the language. Furthermore, depending on the number of the languages the summarization deals with, there could be a distinction between the mono-lingual, where the input and the output languages are the same, the multilingual, if the summarization system is able to deal with different languages, or cross-lingual, if the input language differs from the output one [20]. In this paper, we proposed an extractive graph-based Arabic ATS approach. It also outlines how the selection of the basic unit of the sentence (stem, word, n-gram), which is supposed to be the base of the calculations of similarity measure and sentence ranking (summarization processes), can influence the quality of the extracted summary. Graph-based Arabic ATS approach relies on the approach introduced by Thakkar [25], who proposed an approach to extract the summary for a given English document through representing it by an undirected graph where sentences are represented by nodes, and similarity (which refers to the word overlap) between each two sentences is represented by the edge weight, then the summary is created by finding the shortest path between the first sentence of the original document and the last sentences. The proposed approach consists of the following main steps: 1. Representing the Arabic document by a directed acyclic graph. 2. Using cosine similarity measure to represent similarity between each two sentences. 3. Proposing connectivity algorithm to ensure that the built graph is connected. 4. Applying an algorithm for finding all paths between the first node and the last node in the graph [13]. 5. Extracting summaries that are compatible with user defined compression ratio.
December 9-11, 2014
Page 158
THE INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT2014) 6. Applying word, stem, and n-gram as basic units in this approach.
2. Related Work Since 1950’s, a lot of research have been conducted on text summarization using variety of approaches and languages. In this section, we will focus on the research which included the extractive approaches for a single document, namely the graph-based approaches. In addition, we will view the research which presented Arabic text summarization systems. Recently, a variety of graph-based methods have been proposed for the single and multi-document summarization for the English documents. These approaches include [12][19][22]. El-Haj et al. [10][11] developed two Arabic summarization systems; the first one is Arabic QueryBased Single Text Summarizer System (AQBTSS) that involves an Arabic document and an Arabic query attempting to provide an acceptable summary for the query of this document. The second one is Arabic ConceptBased Text Summarization System (ACBTSS) that takes a set of words representing a certain concept to be the input to the system instead of a user’s query. The two systems share first two phases, which are document selection; where the user selects a document that match his/her query from the document collection, and splitting document into sentences. The two systems starts to be different in sentence matching phase; in AQBTSS each sentence is compared against the user query to find relevant sentences, where in ACBTSS each sentence is matched against set of keywords that represent a given concept, after that the two systems use Vector Space Model (VSM) in summarization phase, where the weighting scheme based on VSM that makes use of two measures; term frequency and inverse document frequency. In order to evaluate the systems; group of 1500 users participated in evaluating the readability of the generated summaries of 251 articles from both systems, where the results showed that AQBTSS outperformed ACBTSS. Ben Abdallah et al. [8] suggested a platform for summarizing Arabic texts, which consists of set of modules: tokenization module, morphological analyzer module, parser module, relevant sentences extraction module, and extract revision module. The evaluation of this platform is carried out on various types of texts (short, average, long) according to execution time, where it noticed that the run time of the modules of the platform for a given text, depends on the size of this text, i.e. the more the text is short the more its run time is weak. Sakhr Summarizer is an Arabic Summarizer engine that finds the most relevant sentences of the source text and displays them as a short summary. The Summarization engine employs the Sakhr Corrector to correct the input Arabic text from common Arabic mistakes automatically, and the Keywords Extractor to identify a prioritized list of keywords to identify the important sentences accurately [10]. Another system for summarization called Arabic Intelligent Summarizer has been proposed by Boudabous M. et al. [9]. This system is mainly based on machine supervised learning technique. The system consists of two phases. The first is the learning phase which informs the system how to extract the summary sentences; SVMs are used for the learning process. The second phase is use
University of Nizwa, Oman
phase, which allows the users to summarize a new document.
3. Methodology In this study, we propose a graph-based Arabic ATS approach, which focus on the semantic relationships between the sentences, in this approach a single document is represented by a graph, where sentences are represented by the nodes of the graph, and the similarity between each two sentences is represented by the weight of the edge between each two nodes. The summary is extracted by finding the shortest path between the first sentence in the original document and the last sentence (first and last nodes in the graph), the nodes that form the shortest path are the extracted summary. Statistical features are determined to rank the sentences, such as: sentence length, sentence position, term frequency and title similarity. PageRank formula is used to combine both ranking sentences and calculating similarity between sentences. To calculate the similarity measure and sentence score, first, we determine the basic unit on which these calculations are based. In this study, three basic units are applied; stem, word, and n-gram. Differences in calculating similarity measure, sentence ranking and the quality of the extracted summary are examined according to each unit.
3.1. Input Single Document In this step, the system reads single document written in Arabic with the file extension (.txt). Encoding format is taken into consideration and should be compatible with the Arabic language.
3.2. Preprocessing The process of constructing the NLP system is not straightforward due to the nature of the Arabic language. Arabic is well-known for its rich and complex morphological and syntactic flexibility [7]. Therefore, some language processes, such as tokenization, stemming, and normalization should take place before starting summarization steps. At the beginning, the system deals with the Arabic texts without diacritical marks since most texts which are written in Arabic and saved in an electronic form do not have these marks.
3.3. Normalization This step has a great effect on the quality of the extracted summary as the redundant and misplaced white spaces are corrected. Furthermore, any consecutive punctuation marks are removed. Also, if there is repeated sentence in the text, it is removed after splitting the document into sentences. This is a significant step because one of the major merits of an efficient summary is that it should not contain any redundant data. To avoid this problem, any repeated information in the single document is removed.
3.4. Tokenization Tokenization is major problem in text summarization since it deals closely with the morphological analysis of documents, particularly those which are written in languages of rich and complex morphology, such as
December 9-11, 2014
Page 159
THE INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT2014) Arabic. The function of a tokenizer is to split a running text into tokens, so that they can be fed into a morphological transducer or Part of Speech (POS) tagger for further processing. Also, the tokenizer is responsible for the determination of word boundaries and the demarcation of clitics, multiword expressions, abbreviations and numbers [6]. In this step, sentences are divided into words on the basis of white spaces and punctuation marks in preparation for the processes of later steps like stemmer and POS tagging.
3.5. Stop Words Removal Stop words are the words which are excluded before the automatic language processing of texts. They are repeated in the texts such as (...،الى، من، )فيand it is better ignored and not indexed so that the search could be improved. Stop words have two major effects on some applications like text summarization and information retrieval. They could influence the efficiency of retrieval as such frequent words have a high tendency to diminish the differences in frequency among less common words affecting the weighting process. Also, stop words removal shortens the length of the document and, consequently, affects the weighting process. Efficiency can be affected by such removal because such stop words are meaningless and not functional [1]. In this study we used Abu El-khair (2006) stop list which contain 1,377 words. The list consists of the following word categories: Adverbs, Conditional Pronouns, Interrogative Pronouns, Prepositions, Pronouns, Referral Names/ Determiners, Relative Pronouns, Transformers (verbs, letters), Verbal Pronouns, and others.
3.6. Stemming Stemming is the process of reducing words to their roots through the removal of any affixes attached to them. For example, stemming the Arabic word “ ”الكتابةproduces the root “”كتب. This root can also be generated from the word ""كاتب. After reducing words to their roots, these generated roots can be used for many applications like compression, spell checking, and text searching. Here we adopted the stemmer of Khoja [18]. In this proposed approach, the stem of the word is employed as the basic unit for calculating the similarity measure in addition to POS tagging, which is performed in a later step.
3.7. Part-of-Speech Tagging Arabic language includes three major parts of speech: nouns, verbs, and particles. The POS tagger assigns the suitable POS for each word. In this study, we have adopted the tagger proposed by Al-Taani et al. [2] with some enhancements. The tagging system consists of three main levels: morphological, lexical, and syntactic analysis.
3.8. Noun Extraction from Sentences Not all words in a document are good indicators of key phrases which reflect the significance of the sentences and the possibility of presence in the summary. Therefore, the process of syntactic filters is applied in this phase, and only nouns are extracted [26].These extracted nouns are the basis of sentence weight calculations.
The input Arabic text is represented by a graph G; Graph G of a document D is a directed graph G = (V, E), where V is a set of nodes and E is a set of edges [17]. In other words, G is a weighted directed graph whose nodes represent sentences of D and edges weights represent similarity between sentences [24].
3.10. Representation of Sentences by Nodes After preprocessing phase, each sentence is provided with an ID, where each ID is represented by a node as dealing with IDs, in some steps, is much easier than the entire sentence.
3.11. Calculating Edge Weight The connection between sentences (directed edges) can be composed on the basis of similarity between the relevant sentences. It is necessary here to point out that the constructed graph is a directed acyclic one, which indicates that for example if the first sentence is related to the sixth sentence, there is a directed edge from the nodes one to six and not vice versa. This maintains the context of the extracted summary, which is fluently integrated. The chosen graph is an acyclic one since the most important sentences should not be extracted more than once, as the efficient summary should not include redundant sentences. The similarity measure is calculated by many parameters, such as content overlap and cosine similarity measures. In this study, the cosine similarity measure is chosen on the basis of term weighting scheme which is the TF-IDF (Term Frequency-Inverse Document Frequency). The TF-IDF weighting scheme is a commonly used information retrieval technique for assigning weights to individual terms appearing in the document. This scheme aims at balancing the local and the global term occurrences in the documents [3][4].
3.12. Summary Extraction The summary is extracted on the basis of Compression Ratio (CR), which is the number of sentences that the summary consists of. This ratio represents a particular proportion of the number of sentences composes the original text that the user has a choice to choose it. After the construction of the graph and the calculation of edge weight, a weighted directed acyclic graph is established, but this graph could be irrelevant due to the fact that the original text could contain subtopics and sentences which do not contain any noun words, noun stems, or n-grams associated with other sentences. Consequently, no edges would link these sub graphs to each other, so the connection between the nodes should be verified before starting summary extraction. This phase consists of two steps; the verification of connection between graphs, and summary extraction. To verify that the graph is connected, we should comply with the following conditions: 1. Each node of a sentence should be related to at least one node of a sentence before and one node of a sentence after it, except the first and the last sentences. 2. The first node of a sentence should be related to at least one sentence after it. 3. The last node of a sentence should be related to at least one sentence before it.
3.9. Building the Graph
University of Nizwa, Oman
December 9-11, 2014
Page 160
THE INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT2014)
4. Experimental Results 4.1 Demo Example The following example shows the representation of an Arabic single document by a directed acyclic graph: The Lute (the Arabic musical instrument " )"العودis a document written in Arabic about Art and Music, which is presented in Table 1. Table 1: Example of Arabic Single Document العود
Sentence العود هي آلة موسيقية شرقية وترية تاريخها موغل بالقدم يرجعه .البعض إلى نوح .و تعني كلمة العود في اللغة العربية الخشب فالعود من اآلالت الوترية العربية له خمسة أوتار ثنائية أو و يمكن ربط وتر سادس إلى العود و يغطي مجاله الصوتي حوالي .األوكتافين و نصف األوكتاف األدبيات الشرقية الموسيقية كلها تظهر وتؤكد على استخدام نوع .أو أكثر من األعواد , يعتبر العود آلة رئيسية في التخت الموسيقي الشرقي في التاريخ الحديث هناك أكثر من دولة عربية تدعي تفوقها في صناعة األعواد ولكن بغداد لها السبق في ذلك ففي بغداد هناك ,أكثر من حرفي يفوم بصنعها وبذلك غدا العود العراقي ذا سمعة عالمية جعلت كبار الملحنين ,والمطربين في العالم العربي يفضلونه على غيره ومن أحد صناع العود المحدثين في العراق سمير رشيد العواد الذي يتسابق الفنانون العرب للحصول على أحد األعواد من , صنعه ، مثل المطرب الكويتي نبيل شعيل والمطرب اللبناني الكبير وديع الصافي والمطرب العراقي المشهور كاظم الساهر والمطرب مهند محسن والمطرب ياس خضر والموسيقار منير بشير
we will evaluate these systems using different compression ratios: 25% and 40%, therefore, the sample size which are tested is 1836 summaries. Intrinsic technique which evaluates the quality of the generated summary itself is chosen to evaluate the proposed systems, which includes precision and recall measures. We tested this system on two different compression ratios, 25% and 40%, using three basic units in summarization process (word, stem, and n-gram), then we have evaluated them using precision, recall, and f-measure. Figure 1 shows that graph-based approach achieved higher rates of success in precision upon the use of n-gram, then the Stem, and then, with a slight difference, the Word; this shows that using n-gram is the most efficient.
ID :1 :2 :3 :4 :5 :6 :7 :8 :9 :11
Figure 1: The precision of Graph-based approach with 25% compression ratio Figure 2 shows that graph-based approach using stem is more efficient than word, where the difference in recall was very simple, about 0.10, but the n-gram was the most efficient unit as it exceeded word and stem units with a clear difference of 0.0265.
The corresponding sentences of the shortest path represent the extracted summary “The Lute” ()اآللة الموسيقية العود document, which is shown in the following Table 2. Table 2: Example of Extracted Summary Sentence ID العود هي آلة موسيقية شرقية وترية تاريخها موغل بالقدم يرجعه:1 .البعض إلى نوح ومن أحد صناع العود المحدثين في العراق سمير رشيد العواد الذي :8 ،يتسابق الفنانون العرب للحصول على أحد األعواد من صنعه والمطرب اللبناني الكبير وديع الصافي والمطرب العراقي المشهور: 11 كاظم الساهر والمطرب مهند محسن والمطرب ياس خضر والموسيقار منير بشير
Figure 2: The Recall of Graph-based approach with 25% compression ratio Figure 3 shows that graph-based approach has achieved the highest rates when using the n-gram, and then followed by stem, then word.
4.2 Results To evaluate the proposed approach, we used the Essex Arabic Summaries Corpus (EASC) as a gold standard, the corpus consists of 153 documents, and each document has 5 summaries, with total of 765 Arabic human-made summaries generated using Mechanical Turk (Mturk) [11]. EASC corpus includes 10 subjects: art, music, environment, politics, sports, health, finance, science and technology, tourism, religion, and education. Each system will extract three summaries for each document in accordance with the chosen basic units. Accordingly, each document in the corpus will have six summaries to be evaluated, taking into consideration that
University of Nizwa, Oman
Figure 3: The f-measure of Graph-based approach with 25% compression ratio Figure 4 shows that graph-based approach achieved the highest rates when it uses the n-gram as a basic unit in the
December 9-11, 2014
Page 161
THE INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT2014) summarization process, also it is notable that, in comparison with the results of this approach for the CR of 25%, the higher compression ratio was the lower precision value was achieved.
Figure 4: The precision of Graph-based approach with 40% compression ratio Figure 5 shows that graph-based approach achieved the highest rates when it uses the n-gram as a basic unit in the summarization process, and it is notable that, in comparison with the results of this approach for the CR of 25%, the higher compression ratio was the higher the recall value was achieved.
5. Conclusions Automatic text summarization is becoming more common in the web and digital libraries environment. Graph-based approaches primarily focus on the semantic relationships between the sentences, so the summaries extracted using these approaches hopefully have a better flow. At the same time graph-based approaches ignores the features of the sentence, which also have a great contribution to the weight of sentence, such as those used in the statistical approaches. The use of stems as basic units in the summarization unit is better than the use of word since all derivations of a single word are treated similar to the same word; therefore, this increases the value of the sentences which make it more possible to have these sentences in the summary. The use of ngram as a basic unit in the summary is better than the use of both stem and word since there are many electronic texts that contain spelling mistakes where stemming and POS tagging systems could not deal with. Therefore, the use of n-gram overcomes such mistakes, which improves the efficiency of summarization approaches in general.
6. Future Work As future work in graph-based systems, we consider improving the stemmer system in the preprocessing phase, where we note that it has an impact on the other steps and improving the algorithm of finding all paths between the first and last sentences in order to achieve an efficient algorithm which saves time and space.
References Figure5 : The Recall of Graph-based approach with 40% compression ratio Figure 6 shows that graph-based approach achieved highest rates when it uses the n-gram as a basic unit in the summarization process, then followed by Stem, then Word, and it is notable that, in comparison with the results of this approach for the CR of 25%, the higher compression ratio was the higher the f-measure value was achieved.
Figure 6: The f-measure of Graph-based system with 40% compression ratio Experimental results for this approach showed that it achieved a precision of 0.502 with 25% compression ratio and 0.492 with 40% compression ratio, which outperformed some of previous approaches that evaluate their results on the same corpus like Alkhatib [5] and ElHaj [11].
University of Nizwa, Oman
[1] Abu El-khair, I. (2006). Effects of stop Words Elimination for Arabic Information Retrieval: A Comparative Study, International Journal of Computing and Information Sciences, 4(3), pp. 119133. [2] Al-Taani, A. and Abu Al-Rub, S. (2009). A RuleBased Approach for Tagging Non-Vocalized Arabic Words, The International Arab Journal of Information Technology (IAJIT), 6(3), pp. 320-328. [3] Alguliev, R. and Aliguliyev, R. (2007). Experimental investigating the F-measure as similarity measure for automatic text summarization, Applied and Computational Mathematics, 6(2), pp. 278-287. [4] Alguliev, R. Aliguliyev, R. Hajirahimova, M. and Mehdiyev, C. (2011). MCMR: Maximum coverage and minimum redundant text summarization model, Expert System with Applications, 38, pp. 1451414522. [5] Alkhatib, k. (2010). Arabic Text Summarization using Single and Multiword Term, MSc thesis, Department of Computer Science, Jordan University of Science and Technology, Irbid, Jordan. [6] Attia, M. (2007). Arabic Tokenization System, Proceeding of Association for Computational Linguistics Workshop on Computational Approaches to Semitic Languages, Prague, Czech. [7] Attia, M. (2008). Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation, Ph.D. dissertation, School of Languages, Linguistics and Cultures, Faculty of Humanities, university of Manchester, UK.
December 9-11, 2014
Page 162
THE INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT2014) [8] Ben Abdallah, M. Aloulou, and C. Belguith, L. (2008). Toward a Platform for Arabic Automatic Summarization, Proceedings of the International Arab Conference on Information Technology (ACIT'08), December 16-18, Hammamet, Tunisia. [9] Boudabous, M. Maaloul, M. and Belguith, L. (2010). Digital learning for summarizing Arabic documents, Proceedings of 7th International Conference on NLP (IceTAL 2010), August 16-18, Iceland. [10] El-Haj, M., Kruschwitz, U., and Fox, C. (2009). Experimenting with Automatic Text Summarization for Arabic, Proceedings of the 4th Language and Technology Conference: Human LanguageTechnologies as a Challenge for Computer Science and Linguistics, LTC’09, Poznan, Poland, pp. 365–369. [11] El-Haj, M. kruschwitz, U. and Fox, C. (2010). Using Mechanical Turk to Create a Corpus of Arabic Summaries in the Language Resources (LRs) and Human Language Technologies (HLT) for Semitic Language, proceedings of workshop held in conjunction with the 7th international Language Resources and Evaluation conference (LREC), May 17-23, Malta, pp. 36-39. [12] Erkan, G. and Radev, D. (2004). LexRank: Graphbased Lexical Centrality as Salience in Text Summarization, Journal of Artificial Intelligence Research, 22, pp. 457-479. [13] Govindan, K. Wang, X. Khan, M. Zeng, K. Gerald Powell, G. Brown, T. Abdelzaher, T. and Mohapatra, P. (2011). PRONET: Network Trust Assessment Based on Incomplete Provenance, Proceedings of IEEE Military Communications Conference, Nov 710, Baltimore, MD, USA. Pp. 1213 – 1218. [14] Hariharan, S. (2010). Extraction Based MultiDocument Summarization using Single Document Summary Cluster, International Journal of Advances in Soft Computing and Its Applications, 2(1), pp 1-16. [15] Hovy, E. H. (2005). Automated Text Summarization, In: The Oxford Handbook of Computational Linguistics, (Mitkov, R., Ed.), Oxford: Oxford University Press, pp. 583–598. [16] Jing, H. and McKeown, K. (2000). Cut and paste based text summarization. Proceedings of the 1 st Conference of the North American Chapter of the Association for Computational Linguistics (NAACL'00), April 29-May 4, Seattle, Washington, pp.178-185. [17] Joyner, D. Nguyen, M. Cohen, N. (2011). Algorithmic Graph Theory. http:// graph-theoryalgorithms-book.googlecode.com, Accessed: 1/6/2012 [18] Khoja, S. and Garside, R. (1999). Stemming Arabic Text, Computing Department, Lancaster University, Lancaster, U.K. http://www.comp.lancs.ac.uk/computing/users/khoja/ stemmer.ps , Accessed: 1/4/2012. [19] Li, Y. and Cheng, K. (2011). Single document Summarization based on Clustering Coefficient and Transitivity Analysis, Proceedings of the 10th International Conference on Accomplishments in Electrical and Mechanical Engineering and Information Technology, May 26-28, Srpska, Banjaluka,
University of Nizwa, Oman
[20] Lloret, E. (2011). Text Summarization based on Human Language Technologies and its Applications, Journal of Natural Language Processing, 48, pp. 119122. [21] Lloret, E. and Palomar, M. (2012). Text summarization in progress: a literature review, Artificial Intelligence Review, 37(1), pp. 1-41. [22] Mihalcea, R. (2004). Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization, Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, companion volume (ACL 2004), Barcelona, Spain. [23] Perumal, k. Chaudhuri. B. (2011). Language Independent Sentence Extraction Based Text Summarization. Proceedings of 9th International Conference on NLP, Macmillan Publishers, India, pp. 213-217. [24] Ruohonen, k. (2008). Graph Theory, http://math.tut.fi, Accessed: 1/6/2012. [25] Thakkar, k. Dharaskar, R. and Chandak, M. (2010). Graph-Based Algorithms for Text Summarization, Proceeding of 3rd International Conference on Emerging Trends in Engineering and Technology (ICETET), Nov 19-21, Nagpur, India, pp. 516 - 519 [26] Wan, X and Xiao, J. (2008). Single Document Keyphrase Extraction Using Neighborhood Knowledge, Proceedings of the 23rd AAAI Conference on Artificial Intelligence, 2, pp. 855-860.
December 9-11, 2014
Page 163