Text Compression by Syntactic Pruning

Text Compression by Syntactic Pruning Michel Gagnon1 and Lyne Da Sylva2 1

2

Département de génie informatique, École Polytechnique de Montréal [email protected] École de bibliothéconomie et des sciences de l’information, Université de Montréal [email protected]

Abstract. We present a method for text compression, which relies on pruning of a syntactic tree. The syntactic pruning applies to a complete analysis of sentences, performed by a French dependency grammar. Sub-trees in the syntactic analysis are pruned when they are labelled with targeted relations. Evaluation is performed on a corpus of sentences which have been manually compressed. The reduction ratio of extracted sentences averages around 70%, while retaining grammaticality or readability in a proportion of over 74%. Given these results on a limited set of syntactic relations, this shows promise for any application which requires compression of texts, including text summarization.

1 Introduction This paper is a contribution to work in text summarization, whose goal is to produce a shorter version of a source text, while still retaining its main semantic content. Research in this field is flourishing (see namely [14, 15, 16]); it is motivated by the increasing size and availability of digital documents, and the necessity for more efficient methods of information retrieval and assimilation. Methods of automatic summarization include extracting (summarizing by using a limited number of sentences extracted from the original text) and abstracting (producing a new, shorter text). Extraction algorithms have a strong tendency to select long sentences from the text (since word frequency and distribution are often crucial, and are higher in long sentences even when sentence length is factored in). Shortening the extracted sentences can be a way to further reduce the resulting summary, provided that the (essential) meaning of the sentence is preserved. Such summaries can presumably allow for shorter reading time. We have thus developed a method for sentence reduction. After presenting our objectives and previous related work, this article details the methodology, and then presents and discusses experimental results. The conclusion outlines future work.

2 Objectives Three objectives are sought in this paper. First, we present the method for text compression based on syntactic pruning of sentences after a dependency-analysis tree has been L. Lamontagne and M. Marchand (Eds.): Canadian AI 2006, LNAI 4013, pp. 312–323, 2006. c Springer-Verlag Berlin Heidelberg 2006

Text Compression by Syntactic Pruning

313

computed. Secondly, although we recognize the existence of numerous resources for the syntactic analysis of English texts (and the evaluation thereof), equivalent systems for French are scarce. Given resources at our disposal, namely a broad-coverage grammar for French, we present a system for compression of French sentences. Finally, we give evaluation results for the sentence reduction approach on a corpus of manually-reduced sentences; this aims to determine whether, after compression, the resulting reduced sentences preserve the grammaticality and essential semantics of the original sentences. Success would suggest this approach has potential as a summarization method.

3 Related Work 3.1 Text Compression in Abstracting An abstract is "a summary at least some of whose material is not present in the input" ([14], page 129). An abstract may reduce sentences from the source text, join sentence fragments, generalize, etc. Work under this banner has occasionally involved sentence reduction, for instance by identifying linguistic reduction techniques which preserve meaning [11, 17]. Also related is work on information selection and fusion: Barzilay et al. [2] focus on fusing information from different sentences into a single representation, and Thione et al [18] apply reduction and fusion techniques to a representation of text structure based on discourse parsing. Sentence reduction techniques vary considerably and some are much harder to implement than others; however, all require a fairly good syntactic analysis of the source text. This implies having a wide-coverage grammar, a robust parser, and generation techniques which defy most existing systems. 3.2 Text Reduction Based on Syntactic Analysis We hypothesize that a robust syntactic analysis can be valuable as a basis for text reduction. Grefenstette [8] experiments with sentence reduction based on a syntactic analysis provided by a robust phrase structure parser [9]. Only some of his reductions guarantee keeping grammatical sentences. Mani et al. [10] compress sentences (extracted from a text) based on a phrase-structure syntax analysis indirectly based on Penn Treebank data; pruning is performed (among other operations) on certain types of phrases in specific configurations, including parentheticals, sentence-initial PPs and adverbial phrases such as "In particular,", "Accordingly," "In conclusion," etc. Knight and Marcu [12] compare a noisy-channel and a decision-tree approach applied to a phrase-structure analysis; they conclude that retaining grammaticality and information content can be two conflicting goals. [13] studies the effectiveness of applying the syntactic-based compression developed by [12] to sentences extracted for a summary. 3.3 Potential for Dependency Analyses With the exception of [12], previous work on summarization based on a syntactic analysis mostly reports disappointing evaluation results. Significantly, no system involves dependency-based grammars. We are interested in exploring the potential of pruning a dependency-syntax analysis; the latter is based on a representation which directly

314

M. Gagnon and L. Da Sylva

encodes grammatical relations and not merely phrase structure. We believe that this allows a better characterization of the sub-parts of the tree that can safely be pruned while retaining essential meaning. Indeed, grammatical relations such as subject and direct object should correlate with central parts of the sentence, whereas subordinate clauses and temporal or locative adjuncts should correlate with peripheral information. Phrase structure tags such as "PP" (prepositional phrase) or "AdvP" (adverbial phrase) are ambiguous as to the grammatical contribution of the phrase to the sentence. Pruning decisions based on syntactic function criteria appear to us better motivated than those based on phrase structure. (Note that this is still different from pruning a semantic representation (e.g. [5]). We are aware of no similar work on French. An equivalent of the Penn TreeBank for French is available [1], but is based on phrase structure. For our part, we were granted access to the source code for a robust, wide-coverage grammar of French, developed within a commercial grammar-checking product (Le Correcteur 101TM, by Machina Sapiens and now Lingua Technologies1). The grammar is dependency-based: syntactic trees consist of nodes corresponding to the words of the sentence, and links between nodes are labelled with grammatical relations (of the type "subject", "direct object", "subordinate clause", "noun complement", etc.). Les médias sont-ils responsables de l’efficacité des publicités qu’ils véhiculent ? Arbre sont/verbe Sujet les médias/nom RepriseSujet ils/pronPers Attrib responsables/adj ComplAdj de l’efficacité/nom ComplNom des publicités/nom Relat véhiculent/verbe ObjetDirect qu’/pronRelat Sujet ils/pronPers FinProp ?/ponctFinale Fig. 1. Sample dependency tree: main verb is labelled as "Arbre". Some sub-trees are simplified.

The grammar aims to perform a complete syntactic analysis of the sentence (see Figure 1 for an indented presentation). In case of failure (due to severe writer error or to limits of the grammar), it provides a series of partial analyses of fragments of the sentence. In all cases, the parser ranks analyses using a weighting mechanism which is a function of weights of individual sub-trees and their combination. The detailed analysis produced by the grammar can be the basis of syntactic pruning for text reduction (this is illustrated in Figure 1). This grammar has many advantages. In addition to its large coverage, it is able to provide a full analysis even with erroneous input (given its embedding within a grammar checking software product). It is indeed wide-coverage: its 80,000 lines of C++ code represent many person-years of development; the grammar consists of over 2500 1

www.LinguaTechnologies.com


315

grammar rules and a dictionary containing over 88,000 entries. Note that other recent work [4] also uses this grammar in a non-correcting context, pertaining to controlled languages. The grammar does, however, have peculiarities which we discuss below. In brief, certain linguistic phenomena are ignored when they have no effect on correction. [Dans le monde en pleine effervescence d’Internet, ]locAdj l’arrivée de HotWired marque le début de la cybermédiatisation [, le premier véritable média sur Internet]app . → L’arrivée de HotWired marque le début de la cybermédiatisation. Fig. 2. Sample reduction: locative adjunct (locAdj) and apposition (app)

4 Methodology We developed an algorithm which performs sentence reduction using syntactic pruning of the sentences. It proceeds by analyzing the sentence, then filtering the targeted relations, while applying anti-filters which prevent certain unwanted pruning by the filter. We should be able to maintain the sentence’s grammaticality, insofar as we prune only subordinate material, and never the main verb of the sentence. 4.1 Analysis The grammar of Le Correcteur 101 is used in its entirety. Extracted sentences are submitted one by one and a complete syntactic analysis of each is performed. Although the parser usually supplies all plausible analyses (more than one, in the case of ambiguous syntactic structures), our algorithm uses only the top-ranking one. This has some limitations: sometimes the correct analysis (as determined by a human judge) is not the highest-ranking one; in other instances, it shares the top rank with another analysis which appears first in the list, given the arbitrary ranking of equal-weight analyses. Our algorithm systematically chooses the first one, regardless. The impact of incorrect analyses is great, as it radically changes results: complements may be related by a different relation in the different analyses, and thus the reduction performed may not be the one intended. This fact (which has no bearing on the appropriateness of the syntactic pruning) has an impact on the evaluation of results. 4.2 Filtering A filtering operation follows, which removes all sub-trees in the dependency graph that are labelled with relations from a predefined list. The entire sub-tree is removed, thus reducing the sentence (for example giving the result in Figure 1). The list of syntactic relations that trigger reduction is kept in an external file, allowing for easy testing of various sets of relations. We adapted the output of Le Correcteur 101 to produce parses corresponding to the full tree and to the pruned tree. The simplest version of a filter consists of a single mother-daughter pair related by relation R, where either node may be specified with a part-of-speech tag or by a lexical item. A more complex version involves multi-daughter structures, or grandmother-granddaughter pairs.

316


A preliminary test was performed using a wide number of relations. Only obligatory complements and phrasal specifiers (such as determiners) were kept. This resulted in a large reduction, producing much shorter sentences which however tended to be ungrammatical. It was determined that a much more focused approach would have a better chance of reducing the text while still preserving important elements and grammaticality. A second run, reported in [6], involved only the following relations: optional prepositional complements of the verb, subordinate clauses, noun appositions and interpolated clauses ("incises", in French). These are encoded with 6 relations, out of the 246 relations used by 101. For the current experiment, we used evaluation results from the previous one to fine tune our filtering algorithm. The present system targets 33 different relations (out of a possible 246), including the 6 mentioned above, and whose dsitribution is given in Table 1. Table 1. Distribution of pruned relations Relation category Relative clauses Subordinate clauses Appositions and interpolated clauses Noun modifiers Emphasis markers Adverbial modifiers Total

Number of relations 9 5 7 2 4 6 33

The list was determined through introspection, on the grounds that these relations typically introduce peripheral syntactic information (see the conclusion for planned work on this aspect). 4.3 Anti-filters The purpose of anti-filters is to prevent the removal of a sub-tree despite the relation it is labelled with. Two aspects of the anti-filters must be described. First, anti-filters can be lexically-based. For example, special processing must be applied to verb complements, to avoid incomplete and ungrammatical verb phrases. The grammar attaches all (obligatory) prepositional complements of the verb (for example, the obligatory locative "à Montréal" in "Il habite à Montréal") with the same relation as optional adjuncts such as time, place, manner, etc. This was done during the development of the grammar to reduce the number of possible analyses of sentences with prepositional phrases (given that this type of ambiguity is rampant, yet never relevant for correction purposes). To circumvent this problem, lexically-specified anti-filters were added for obligatory complements of the verb (e.g. the pair "habiter" and "à"). Since this is not encoded in the lexical entries used by the grammar, it had to be added; for our tests, we hand-coded only a number of such prepositional complements, for the verbs identified in our corpus (while pursuing a related research project of automatically identifying most interesting verb-preposition pairs through corpus analysis). Secondly, anti-filters may examine more than a local sub-tree (i.e. one level). Indeed, anti-filters are expressed using the same "syntax" as the filter and may involve


317

"se demander" Subord

Subordinated verb ConjSubord

si

...

Fig. 3. Sample pattern for 3-level anti-filters

three-level trees. This is useful for sentences containing specific types of complement clauses. Verbs which require an if-clause ("complétive en si"), such as "se demander" ("to wonder if/whether"), have their complement labelled with a subordinate clause relation (again, to reduce the number of unnecessary ambiguous analyses by the grammar) and the complementizer "si" is a dependent of the embedded verb. Yet this clause is an obligatory complement and should not be pruned (just as direct objects and predicative adjectives are not), but would be subject to pruning due to the "subordinate clause" label it receives and the distance between the main verb and the complementizer "si", which allows its identification as an obligatory clause. This requires a more complex pattern, since two relations are involved (see Figure 3).

5 Evaluation In order to evaluate the performance of the pruning algorithm, we first built a corpus of sentences which were compressed by a human judge and then compared the automatically-reduced sentences to the manually reduced ones. The test corpus consisted of 219 sentences of various lengths, extracted from the Corfrans corpus2 which contains approximately 3000 sentences. We extracted random sentences, equally distributed among the different lengths of the sentences. We wished to have different lengths, so that we could evaluate how the reduction performance correlated with sentence length. Intuitively, short sentences should be hard to reduce, since the information they contain tends to be minimal. Longer sentences are expected to contain more unessential information, which makes them easier to reduce, but are also difficult to analyse, which may cause errors in the reduction. For our tests, sentences of 5 or fewer words were discarded altogether. Among the rest, about 25% of sentences have between 6 and 15 words, about 25% have between 16 and 24 words, about 25% have between 25 and 35 words, and the remaining 25% have between 36 and 131 words (with a large spread in sentence length for that last interval). The final distribution is shown in Table 2. The manual reduction was done using the following guidelines: the only operation allowed is the removal of words (no addition, rewording nor reordering); truth values and logical inferences must be preserved; only optional complements can be removed; resulting sentences are not syntactically nor semantically ill-formed; the only "errors" 2

http://www.u-grenoble3.fr/idl/cursus/enseignants/tutin/corpus.htm

318

M. Gagnon and L. Da Sylva Table 2. Details of the evaluation corpus Sentence length 6-15 16-24 25-35 36-80, 82-86, 88, 89, 91, 94, 95, 99, 100, 104, 113, 131

# sentences Total 5 sentences each 50 6 sentences each 54 5 sentences each 55 1 sentence each 60

tolerated are unadjusted punctuation, affixation (agreement) or capitalization. An example is that shown in Figure 2 above. The methodology used explains the small size of the corpus: evaluation necessitated a careful, manual reduction of all sentences. No evaluation corpus was at our disposal for this collection of dependency analyses of French texts and their reduced equivalent. According to [3], there is only one genuine annotated corpus for French, developed by Abeillé and her team [1]. The same lack of resources in French is remarked in [7].

6 Results Each sentence was examined to determine (i) if it had been pruned, (ii) whether the result preserved essential content and (iii) how it compared to a manual reduction by a human expert. Of 219 sentences, 181 were reduced by our algorithm (82.6%). We partitioned the corpus into two halves, containing the shorter and longer sentences, respectively. In the short sentence partition (sentences with at most 28 words), 82 of 110 (74.5%) have been reduced, and in the second partition (sentences with more than 28 words), 99 of 109 (90.8%). This gives more evidence that short sentences are harder to reduce than long sentences. Good (slightly ungrammatical) Je désire que la vérité éclate et que si vraiment, comme tout semble le faire croire, c’est cet épicier qui était le diable, il est convenablement châtié. I want the truth to shine and the grocer to be apropriately punished, if really, as everything make us believe it, he was the devil → Je désire que la vérité éclate et que si vraiment, c’est cet épicier qui était le diable, il est convenablement châtié. I want the truth to shine and the grocer to be apropriately punished, if really, he was the devil Acceptable Le Soleil lui-même a de nombreuses répliques dans le ciel ; The Sun itself has many replicas in the sky ; → Le Soleil lui-même a de nombreuses répliques ; The Sun itself has many replicas; Bad Je n’entretiens aucun doute sur le caractère national qui doit être donné au Bas-Canada ; I have no doubt about the national character that must be given to Low-Canada ; → Je n’entretiens aucun doute ; I have no doubt ; Fig. 4. Sample of good, acceptable and bad reductions


319

Table 3. Quality of reductions Sentences Good Acceptable Bad Total Overall 106 (58.6%) 29 (16.0%) 46 (25.4%) 181 Short sentences 56 (68.3%) 13 (15.9%) 13 (15.9%) 82 Long sentences 50 (50.5%) 16 (16.2%) 33 (33.3%) 99

6.1 Semantic Content and Grammaticality Correct reductions are those which are either good (i.e. the main semantic content of the original sentence is retained in the reduction - as in Figure 2) or acceptable (i.e. a part of the semantic content is lost, but the meaning of the reduced sentence is compatible with that of the original). Bad reductions are those where crucial semantic content was lost, or which are strongly ungrammatical. Some cases were ungrammatical but still considered acceptable (e.g. punctuation errors). See Figure 4 for sample reductions. Table 3 shows that about 58% of the reductions were judged, by a human judge, to be "good"; 16% were "acceptable" and 25% "bad". If we consider separately the short sentences of the test corpus (with sentence length not exceeding 28 words), we see that the number of good reductions is greater in the short sentence partition (68% compared to 50%); there are more bad reductions among long sentences (33% compared to about 16%). The system thus appears weaker in the task of reducing longer sentences. Table 4 shows the proportion of correctly reduced sentences (good or acceptable), among the 181 sentences that have been pruned. We see that most of the reduced sentences are correctly reduced and, as expected, the proportion of long sentences that are correctly reduced is significantly low, compared to short sentences. Table 4. Ratio of reduced sentences Sentences Proportion of correctly reduced sentences (out of 181) Overall 74.6% Short sentences 84.2% Long sentences 66.7%

6.2 Compression We calculated the compression rate for each reduced sentence (i.e. the size of the pruned sentence, in words, compared to the number of words of the original sentence). We compared this to the "ideal" compression rate, corresponding to the manual reduction (which we refer to as the "Gold standard"). To estimate the performance of the reduction for different length sentences, results are shown for the corpus overall, and for both long and short sentence partitions. The first two columns give the average reduction rate for the reduction realized by the human judge (the Gold standard) and our system, respectively. The next three columns give some evaluation of the reduction. Classic precision and recall are calculated in terms of the number of words retained by the system, based on the Gold standard. Finally, agreement is a measure of the number of

320


shared words between reduced sentence and Gold standard (divided by the total number of words in both). These simple measures (based on an ordered "bag of words") appear warranted, since word order had been preserved by the reduction method. We note that, considering the overall reduction rate, 70.4% of words have been retained by the system. This shows potential for useful reduction of an extract. This result is not far from the ideal reduction rate (65.9%), but looking at the figures in Table 2, we see that 25.4% of the reduced sentences are incorrectly reduced. Correct pruning in some sentences is offset by incorrect pruning in others. Also, agreement is not very high, but precision and recall show that much of the content is preserved in our reductions, and that these reduced sentences do not contain too much noise. The reason for low agreement values is that a reduced sentence usually differs from its Gold standard by the presence or absence of a whole phrase, which usually contains many words. Table 5. Reduction rate in terms of words (%) Sentences

Overall Short sentences Long sentences Among “correct” reductions Among “bad” reductions

Gold standard System revs Original duction vs sentence (%) Original sentence (%) 65.9 70.4 78.4 80.7 53.6 60.3 67.3 73.9 60.0 57.1

Precision Recall

Agreement (%)

79.1 86.9 71.3 80.5 73.4

67.6 77.4 57.9 71.4 52.8

83.8 89.2 78.4 87.9 68.1

The reduction rate in the last two lines of Table 5 separates correct (good or acceptable) reductions from bad ones. Thus, if only the correctly reduced sentences are considered, our system achieved 73.9% compression, while the human expert compressed by a ratio of 67.3% for the same sentences. Hence our correct reductions were still not reduced enough by human standards. Considering only the bad reductions, our compression rate is lower (57.1%) than that of the human judge (60%), but only slightly. Where the reduction rate is smaller, reductions are typically worse (57.1% compression for bad reductions compared to 73.9% for correct ones). We can make the same observation for the sentence length (60.3% compression for long sentence compared to 80.7% for short ones). By analyzing the bad sentences, we see that most of them are long sentences that are incorrectly analyzed by the parser. Thus, long sentences, which would usually benefit more from the reduction process, unfortunately suffer from their difficulty to be correctly analyzed. 6.3 Some Problems with the Grammar or the Corpus As expected, some incorrectly reduced sentences (28, or 12.8%) are due to the fact that no correct analysis has been returned by the parser. And when the grammar had trouble finding the right analysis among several, it sometimes suggested corrections that were inappropriate (in 2 cases). In 6 cases, severe errors in the original sentence prevented the parser from finding a good analysis (for example, the sentence contained both a short


321

title and the actual sentence, not separated by a dash or other punctuation; or it lacked the proper punctuation). In 6 cases, the parser was unable to give a complete analysis, but provided analyses of fragments instead. Finally, in 14 cases (6.4%), adjuncts were removed but were actually obligatory and should not have been.

7 Discussion 7.1 Shortcomings of the System By a closer inspection of sentences that are incorrectly reduced, we found that in some cases, a good reduction would necessitate major changes in our model, or some semantic information that is beyond the scope of the parser. In the example in Figure 5, the prepositional modifier "dans une opposition rigide et unilatérale" (in a rigid and unilateral opposition) cannot be argued to be required (syntactically) by either "peuvent" (can), nor "être" (to be) nor "pensés" (thought). Yet, semantically, removing the phrase produces an ungrammatical or semantically incoherent sentence. Although our goal is to test the hypothesis that syntactic pruning can yield useful sentence compression, we must recognize that the system suffers from its lack of semantics. Indeed, for each relation, the semantics of the pruned element should actually be taken into account before deciding whether pruning is appropriate. L’histoire des sciences est venue quant à elle montrer que le vrai et le faux ne peuvent être pensés dans une opposition rigide et unilatérale. History of science showed, as far as it is concerned, that the true and the false cannot be thought in a rigid and unilateral opposition. → L’histoire des sciences est venue montrer que le vrai et le faux ne peuvent être pensés. History of science showed that the true and the false cannot be thought.

Fig. 5. Sample unrecoverable reduction

The system is limited by the fact that it only takes the first analysis from the parser. It also does not adapt its processing according to sentence length, yet we have observed that this is not independent of the correctness of the pruning. We suspect that certain relations have a greater impact (a systematic comparison of relations pruned by the human judge and by the system has yet to be performed). 7.2 Possible Improvements Two methods could be explored to control the pruning. Since sentences judged to be incorrectly reduced are those which undergo more reduction, it suggests that a compression threshold could be determined, below which pruning should not be applied (this could take the relation name into account). Also, we could use the fact that the system uses a grammar which can detect errors in sentences: submitting reduced sentences to parsing and correction could provide a way of refusing reduction when the result is ungrammatical. We can also suggest a way of circumventing the problem of our system only using the first analysis (and hence sometimes going astray): the first N analyses could be used

322


and pruned, comparing output; this output could be ranked according to compression ratio, rank of the original tree and frequency (as two original trees may be pruned to the same one)3 . Finally, an experiment could be devised to compare the efficiency of different sets of pruning rules.

8 Conclusion and Future Work We have proposed a method for text reduction which uses pruning of syntactic relations. The reduction rate achieved by our system (about 70% reduction of sentences) shows great promise for inclusion in a text reduction system such as summarization. Compared to previous work [6], we got a higher number of reduced sentences (82.6% vs 70.0%) and a better reduction rate (70.4% vs 74%). At the same time, the ratio of correctly reduced sentences has been significantly increased (74.6% vs 64%). Precision and recall values are about 80%, which is not bad, but should be improved to make our system suitable to be used in a real application. The loss of performance is due principally to the lack of semantic analysis and the limitations of the parser, which are beyond the scope of our model. Future work will mainly focus on comparing our results with similar syntactic pruning based on phrase structure and other corpora and parsers (for French texts, initially). The Corfrans corpus at our disposal is tagged with phrase-structure analyses, so a comparison of both approaches should prove interesting. We also have access to a similar dependency grammar for English (developed as part of a grammar checker as well), in addition to other well-known freely-available English parsers. Its coverage is not as wide as that of Le Correcteur 101, but has a lexicon of verbs completely specified as to obligatory prepositional complements. For this reason, we intend to pursue experiments on English texts. The limits of the parser seem to strongly influence the results, so more experiments should be realized with very good parsers.

Acknowledgements This researched was funded by a grant from the National Science and Engineering Research Council awarded to Lyne Da Sylva. Many thanks to Claude Coulombe, president of Lingua Technologies, for use of the source code for Le Correcteur 101. We also wish to thank our research assistants, Frédéric Doll and Myriam Beauchemin.

References 1. A. Abeillé, L. Clément, and F. Toussenel. Building a treebank for french. In Abeillé A., editor, Treebanks: Building and Using Parsed Corpora, pages 165–188. Kluwer Academic Publishers, 2003. 2. R. Barzilay, K. McKeown, and M Elhadad. Information fusion in the context of multidocument summarization. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, pages 550–557, New Brunswick, NJ, 1999. Association for Computational Linguistics. 3

We thank an anonymous reviewer of this paper for this idea.


323

3. P. Blache and J.-Y. Morin. Une grille d’évaluation pour les analyseurs syntaxiques. In TALN 2003, pages 77–86, Batz-sur-mer, 11-14 juin 2003. 4. C. Coulombe, F. Doll, and P. Drouin. Intégration d’un analyseur syntaxique à large couverture dans un outil de langage contrôlé en français. Lingvisticae Investigationes (Machine Translation, Controlled Languages and Specialised Languages), 28(1), 2005. 5. M. Fiszman, T.C. Rindflesch, and H. Kilicoglu. Abstraction summarization for managing the biomedical research literature. In Proceedings of the Computational Lexical Semantics Workshop, HLT-NAACL 2004, pages 76–83, Boston, Massachussetts, USA, May 2-7, 2004. 6. M. Gagnon and L. Da Sylva. Text summarization by sentence extraction and syntactic pruning. In Computational Linguistics in the North-East (CLiNE), Gatineau, 2005. 7. M.-J. Goulet and J. Bourgeoys. Le projet gÉraf : guide pour l’évaluation des résumés automatiques français. In TALN 2004,, Fès, 19-21 avril 2004. 8. G. Grefenstette. Producing intelligent telegraphic text reduction to provide an audio scanning service for the blind. In Working Notes of the Workshop on Intelligent Text Summarization, pages 111–117, Menlo Park, CA, 1998. American Association for Artificial Intelligence Spring Symposium Series. 9. G. Grefenstette. Light parsing as finite-state filtering. In Proceedings ECAI ’96 Workshop on Extended Finite State Models of Language, Budapest, August 11-12, 1996. 10. M. Inderjeet and M.T. Maybury. Advances in Automatic Text Summarization. MIT Press, 1999. 11. H. Jing and K McKeown. The decomposition of human-written summary sentences. In Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR’99), pages 129–136, New York, 1999. Association for Computing Machinery. 12. K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139:91–107, 2002. 13. C.-Y. Lin. Improving summarization performance by sentence compression - a pilot study. In Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages (IRAL 2003), Sapporo, Japan, July 7, 2003. 14. I. Mani. Automatic Summarization. John Benjamins, Amsterdam; Philadelphia, 2001. 15. J.-L. Minel. Résumé automatique de textes. Traitement automatique des langues, 45(1), 2004. 16. NIST. Document understanding conference - introduction. Technical report, [http://wwwnlpir.nist.gov/projects/duc/intro. html]. 17. H. Saggion and G. Lapalme. Selective analysis for the automatic generation of summaries. In C. Beghtol, L. Howarth, and N. Williamson, editors, Dynamism and Stability in Knowledge Organization – Proceedings of the Sixth International ISKO Conference, pages 176–181, Toronto, jul 2000. International Society of Knowledge Organization, Ergon Verlag. 18. G.L. Thione, M.H. van den Berg, L. Polanyi, and C. Culy. Hybrid text summarization: Combining external relevance measures with structural analysis. In Proceedings of the ACL2004 Workshop Text Summarization Branches Out, Barcelona, Spain, July 25-26, 2004.

Text Compression by Syntactic Pruning

Text Compression by Syntactic Pruning

Suggest Documents

Text Summarization by Sentence Extraction and Syntactic Pruning

Linear-Time Text Compression by Longest-First

CASCADE: Cluster-based Accurate Syntactic Compression of ...

Visual-Syntactic Text Formatting

Syntactic tools for text watermarking

Syntactic methodology of pruning large lexicons in cursive script ...

Text Compression: Syllables - CiteSeerX

2. Text Compression

Modeling for Text Compression

Compression of Neural Machine Translation Models via Pruning

Text Classification with Compression Algorithms

Principled Dictionary Pruning for Low-Memory Corpus Compression

n-Gram-Based Text Compression

Text Classification with Compression Algorithms

advanced lossless text compression algorithm

Text Classification with Compression Algorithms

Visual-Syntactic Text Formatting - Live Ink

Speeding Up Pattern Matching by Text Compression - Springer Link

Syntactic sentence compression in the biomedical domain: facilitating

PREDICTION BY COMPRESSION 1. Introduction Data compression ...

Clustering by compression

Clustering by compression

Graph Compression by BFS

IMPROVING IDENTIFICATION BY PRUNING: A CASE ... - Eurecom