Robust Dependency Parsing of Spontaneous Japanese Speech and Its Evaluation ... P(S|B) can be calculated as follows: P(S|B) = n. ∏ i=1. P(bi rel. → bj|B),. (1).
Aug 2, 1995 - ... a maximal parse. With pruning, the system is no longer guaranteed to produce. 1 ...... 3] K. J. Lee, C. J. Kweon, J. Seo, and G. C. Kim. A robust ...
Aug 2, 1995 - We placed a version of the robust parser on the Word Wide Web for ..... nds a word W and a disjunct d on W such that the connector l matches ...
Number of trees at percentile rank 90. Number of trees at percentile rank 10. Figure 4: CFG ambiguity (medians are computed on classes of sentences of length ...
mar rule' format are of two types: one or more right- ... build a simple weighted rule in a compositional fash- .... In the ann rule we have made use of the Kleene.
Claire Grover and Alex Lascarides. Division of Informatics. The University .... 2The LT TTT tagger uses the Penn Treebank tagset (Mar- cus et al., 1994): JJ labels ...
t~l~phone & Saignelegier c'est. Mottaz m o deux ta .... build a simple weighted rule in a compositional fash- ... lexical knowledge when we need to specify lexical.
Segmented Discourse Representation Theory (SDRT, Asher and Lascarides (2003)) ... our anaphora resolution system, see Denis and Baldridge (2007a,b). ...... A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, D. Jurafsky R. Bates, P. Taylor, ...
Lane Schwartz. University of Minnesota. University of ...... Baker, James. 1975. The dragon system: an overivew. ... Mabry Tyson. 1996. Fastus: A cascaded ...
Ryan McDonald, Kevin Lerman, and Fernando Pereira. 2006. Multilingual dependency analysis with a two- stage discriminative parser. In Proc. of the Tenth Con ...
Feb 2, 2011 - *X. S. Zhou is with the Siemens Medical Solutions USA, Inc., Malvern, PA. 19355 USA (e-mail: ...... In ad-
Nov 24, 1999 - Roberto Gemello, Elisabetta Gerbino, and Claudio. Rullent. ... Report COM-97-06, IDIAP, Martigny, December 1997. [4] Peter. R.J. Asveld.
guities. He shows that CDG has a weak generative capacity beyond context-free ... idea is to start with an arbitrary dependency analysis for a sentence and to repair it ... mous internal results during parsing even if the space complexity is linear i
Feb 2, 2011 - Our algorithm was used to enhance advanced image visualization workflows by ... THE amount of medical imag
He proposes the formalism Constraint Dependency Grammar (CDG) ... Helzerman & Harper (1996) extended CDG by ...... In: Proceedings of COLING-90. (vol.
the information with high accuracy (Murphy and .... Anthony Cox and Charles L. A. Clarke. 2003. ... Uwe Kastens, Anthony M. Sloane, and William M. Waite. 2007 ...
Aug 23, 2014 - proaches for performing the multilingual semantic parsing task. .... 2009), whose code is publicly available, makes the assumption that there ...
sentence (infinitive, surface-cases of dependent noun phrases . ... to choose the correct reading of a word. Tests check the syntactic and the semantic context in which ... If the end of the sentence has been .... in (4) the object of has to be a sub
import time class MyThread(threading.Thread): def __init__(self,str): threading.Thread.__init__(self) self.name = str de
[4] John Aycock and Nigel Horspool. Faster generalised LR parsing. In Compiler ... [5] Peter T. Breuer and Jonathan P. Bowen. A PREttier Compiler-Compiler: ...
Angle brackets on both sides of a string in italic are used to indicate a ..... has been developed for PhonotacticSearch, written in HTML and calling the search.
Deep parsing in Watson. M. C. McCord. J. W. Murdock. B. K. Boguraev. Two deep
parsing components, an English Slot Grammar (ESG) parser and a ...
parse state corresponds to several di erent productions of the grammar, ..... In C, the syntax of expression used to de ne variables is a restricted version of the.
Robust Parsing MARTIN RUCKERT Ludwig-Maximilians-Universitat Munchen
Robust parsing is introduced as a form of bounded context parsing. As such, the process of parsing de nes a continuous mapping of symbol strings to parse trees, a mapping where small changes (errors) in the symbol string will cause only small changes in the resulting trees. The dierences between robust parsing and LR(k) parsing are explored. A method of generating robust parsers is presented and complemented by the description of an implementation of a parsergenerator and its output. Included are some examples to illustrate the results of robust parsing. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors|Parsing ; Translator writing systems and compiler generators General Terms: Languages, Theory, Algorithms Additional Key Words and Phrases: Bounded context parsing, Error correction, LR(k) parsing, Parser generator, Robust parsing
1. INTRODUCTION The method of robust parsing, presented in this paper, was developed as part of ongoing research in automatic error diagnosis and correction. Most parsers will detect an error as soon as the current input string is no longer a pre x of any correct sentence. At this point various error recovery schemes can be used: panic mode, forward repair[Irons 1963; Graham and Rhodes 1975; Mauney and Fisher 1982], combinations of both[Pai and Kieburtz 1980], local repair [Leinius 1970; Sippu and Soisalon-Soininen 1983; Tai 1978; Levy 1975], and global least cost repair[Aho and Peterson 1972]. A more detailed discussion of error recovery methods can be found in [Hornig 1974] and [Rohrich 1980]. Recent approaches to error diagnosis and recovery use combinations of simple local repairs with regional minimum distance correction and elaborate global recovery schemes which have little to do with the primitive panic mode (e.g. [Charles 1991]). The theoretically best methods use O(n3 ) parsing algorithms [Graham et al. 1980; Early 1970] and are considered too slow for practical use [Wilcox et al. 1976]. None of these methods produces an error diagnosis good enough to attempt true Author's address: Martin Ruckert, SUNY New Paltz, 75 South Manheim Blvd., New Paltz, NY 12561, [email protected] Permission to make digital/hard copy of all or part of this material without fee is granted provided that the copies are not made or distributed for pro t or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery, Inc. (ACM). To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior speci c permission and/or a fee.
2
Martin Ruckert
error correction. A dierent approach, proposed by Richter [1985], is non-correcting syntax error recovery which was made a practical alternative through work of Bates and Lavie [1994]. Using theoretical investigation of parsing and error correction, Wetherell [1977] concludes: \But there is simply no hope of building ecient (i.e. linear time bounded) correctors which can handle any reasonably large class of errors perfectly for practical programming languages. But there are ways around this problem." Wetherell proposes a heuristic approach to language understanding and error diagnosis. While the author is not aware of any research using heuristic methods to understand errors in computer programs, heuristic parsing is used frequently for natural languages. Statistical information, as in [Jones and Eisner 1992], or more often semantic constraints, as in [Konig 1991; Young 1991; Jacobs et al. 1991], are used successfully to improve quality and speed[Moore and Dowding 1991] of parsing. Traditional LL or LR parsers oer, after an error has been detected, only the current parse state and the still unprocessed stream of input tokens as a basis for error diagnosis. Input tokens are certainly not the right level for knowledge based error diagnosis. Instead, we would like to extract as much structure from the token stream as the presence of errors permits. The parse states, contained in the parse stack, can be interpreted as an encoding of the program structure up to the point of error. Usually, however, a single parse state corresponds to several dierent productions of the grammar, which are equivalent only in the absence of errors. Further, a parser can advance using empty productions and defaults, and can reach nally a parse state which is no longer justi ed by the input. So, the information given by the parse state is often ambiguous or even misleading. Small errors in the input can derail the parser completely. A parser, suitable to prepare the input for a knowledge based error diagnosis, should be able to combine all input tokens into larger structures as long as the context of the tokens permits this without ambiguities. To allow error correction based on the parser output, small changes (errors) to the input should cause only small changes in the parser output. The parser should run eciently, in linear time, and a parser-generator should facilitate its production. The robust parser described here is such a parser. This paper presents, after giving some de nitions, a comparison of LR(k) parsing, Bounded Context (BC) parsing, and Weak Bounded Context (WBC) parsing. Then, robust parsing is introduced as a form of WBC parsing. Next, the implementation of parser and parser-generator is described. Two examples are given to explain the results of parsing incorrect input. The last section deals with the use and handling of ambiguous grammars. Here and throughout this paper, the programming language C[American National Standards Institute 1990; Kernighan and Ritchie 1988] is used to investigate the more practical aspects of robust parsing.
Robust Parsing
3
2. LR(K), BC, AND WBC PARSING 2.1 De nitions 2.2 Symbols and Strings. In describing a grammar, the typewriter font is used for terminal symbols. Nonterminal symbols are written using enclosed in pointed brackets. Uppercase italics are used for symbols, e.g. S , T , U , L, and R; lowercase italics are used for strings of symbols, e.g. l, r, p, q, u, v, w, s, and t. The symbol is used to denote the empty string. 2.3 Grammar, Deduction, Phrase, and Handle. Grammars are described using production rules of the form ! s. In the usual way, these rules are used to derive a string from another string by repeatedly replacing an occurrence of the left hand side of a rule by the right hand side. A string that can be derived from a single symbol is called a \phrase". A \handle" of a phrase s is a substring u of s and a production ! u such that the replacement of by u is one step in the derivation of s. If the production is clear from the context, we speak of the substring u as a handle. 2.4 LR(k), BC and WBC parsing. Parsing is the reconstruction of the derivation from a given phrase. All the parsing algorithms considered here use bottom-up parsing by handle pruning: the derivation is reconstructed by iteratively nding a handle and replacing it by the left hand side of the handles production. The de nitions given now are informal, but sucient for the present discussion. A grammar is called a BC grammar|parseable with Bounded Context|, if, given any phrase, it is possible to identify all handles of the phrase looking at m symbols to the left and n symbols to the right of the handle (for some nite m and n)[Floyd 1964]. A grammar is called a LR(k) grammar|parseable from Left to Right with k symbols look-ahead| if, given a phrase derived from a special start symbol, it is possible to identify the leftmost handle of the phrase looking at all the symbols to the left of the handle and at k symbols to the right of the handle (for some nite k)[Knuth 1965]. A grammar is called a WBC grammar|Weak parseable with Bounded Context| , if, given any phrase, it is possible to identify at least one handle of the phrase looking at m symbols to the left and n symbols to the right of the handle (for some nite m and n). 2.5 Comparison It is immediately clear from the de nition, that the BC grammars are a subset of both LR(k) grammars and WBC grammars. BC grammars never gained much importance, since shortly after Floyd presented them in 1964, Knuth, in 1965, published his landmark paper "On the Translation of Languages from Left to Right" where he introduced LR(k) grammars and showed how very ecient parsers can be constructed for this larger class of grammars. Looking at simple examples, one can see that WBC grammars are not a subset of LR(k) grammars, nor are all LR(k) grammars WBC grammars. In a certain sense, however, WBC grammars allow for the stronger parsing algorithms.