Introduction to Parsing Ambiguity and Syntax Errors
Recommend Documents
propose that verbs of the unergative class (such as race) have a more complex transitive structure than other .... (10a) The ice storm froze the pipes. (10b) The ...
5. C H A P T E R 2. Common Syntax and. Semantic Errors. 2.1 CHAPTER
OBJECTIVES. • To understand the fundamental characteristics of syntax and
semantic.
Summary of the RRG linking system. 214. 6.43. Active and ... these Russian
examples the order of the words is not the key to their interpretation, as it is in the
...
have been wonderful, motivational grad school compatriots, despite the hundreds of ... She helped me to see how wonderful life can be outside academia and ...
Keywords. Top-down parsing, left-recursion, memoization, backtrack- ing, parser ... of an implementation in programming language Haskell are presented, more ...
May 27, 2008 ... on the textbook “Syntax. A generative introduction”, 2nd edition 2007 from
Andrew Carnie. 2. One group of students will give us a 30min ...
preprocessor-based syntax errors and quantify to what extent they occur in ...
considers a syntax error as an incorrect output of the preprocessing task [9], i.e., it
...
the MS, thus outside the whenever-clause and not c-commanded by she. ... parser builds both alternatives and maintains them in parallel. ..... there seems to be some principle of the parser that prefers to attach incoming material ...... adverbial su
language without considering syntax. We describe ... therefore a complete syntactic parsing of the sentence. ..... ellipses and the interruptions, Our parser is thus.
Syntax-Semantics Interaction Parsing Strategies. Inside SYNTAGMA. Daniel Christen. (www.lector.ch). Abstract. This paper discusses SYNTAGMA, a rule based ...
syntactic and semantic information, leading the parser to a coherent and ..... The next example shows a NP pattern corresponding to an input like "this ..... 3085.01 (above) requires a "PERSON" as its object ("per qualcuno": "for somebody"), the.
Email : [email protected]. ABSTRACT. Parsing spontaneous speech is a difficult task because of the ungrammatical nature of most spoken utterances.
1. Asudeh. Carleton University · January 20, 2005. 2 .... Richard seems like the assertion by Ida that Thora suspects the motives be- hind the gift has offended his ...
A syntax error is a violation of the notational rules of the programming language. (
synonym: compile-time errors). Programming errors (faults) occur in two basic ...
Oct 21, 2007 - Th multiplication sign. ⢠The number 60. This process of grouping individual characters is called lexical analysis or lexical scanning, the result.
Mar 2, 2007 ... 3 Syntactic Forms, Grammatical Functions, and Semantic Roles .... start with the
basic properties of English words, and then rules for combining ...
Sep 30, 2010 - J. Nathan Foster, Alexandre Pilkiewicz, and Benjamin C. Pierce. Quotient lenses. In Proceeding of the Int
Sep 30, 2010 - our programming technique towards an algebra of partial isomor- phisms. Categories and Subject Descriptor
How to Avoid Common Syntax Errors When Using WebAssign. In General. Don't
be afraid of consulting the “Student Guide”. It is your syntax friend. Pay attention ...
Oct 25, 2008 - Cacciaguidi-Fahy 2006), but also in poetry (Su 1994) and in humor (Raskin 1985). But even in ... of the Penn Treebank (Marcus et al. 1993) but ...
College of Computer Science, Northeastern University, Boston, Massachusetts 02115. yResearch supported by the ... {Sanders, a Lockheed Martin Company, Nashua, New Hampshire. 1 ...... community accepted representation. Although not ...
This paper describes error handling and ambiguity in a class of applications organized around drawing and .... The Recognizer [4] is a fast, simple and compact ... set of filters either to identify shapes or to filter out ... guided to infer the most
[NP coh-a-ha-nun TV phu-lo-ku-laim-ul]. [VP noh-chye + pe-li-ko + mal-ass-ta.] In intra-chunk dependency parsing, we make the. VP chunk a single node and ...
Textbook: Modern Syntax, 2011, by Andrew Carnie, Cambridge University Press.
Additional Readings. • Martin Haspelmath (to appear). How to compare major ...
Introduction to Parsing Ambiguity and Syntax Errors
Syntax errors. Compiler Design 1 (2011). 3. Languages and Automata. • Formal
languages are very important in CS. – Especially in programming languages.
Outline
Introduction to Parsing Ambiguity and Syntax Errors
Intuition: A finite automaton that runs long enough must repeat states • A finite automaton cannot remember # of times it has visited a particular state • because a finite automaton has finite memory
– Especially in programming languages
• Regular languages – The weakest formal languages widely used – Many applications
– Only enough to store in which state it is – Cannot count, except up to a finite limit
• Many languages are not regular • E.g., language of balanced parentheses is not regular: { (i )i | i ≥ 0}
• We will also study context-free languages
Compiler Design 1 (2011)
2
3
Compiler Design 1 (2011)
4
The Functionality of the Parser
Example
• Input: sequence of tokens from lexer
• If-then-else statement if (x == y) then z =1; else z = 2; • Parser input IF (ID == ID) THEN ID = INT; ELSE ID = INT; • Possible parser output
• Output: parse tree of the program
IF-THEN-ELSE == ID Compiler Design 1 (2011)
5
Comparison with Lexical Analysis
= ID
ID
= INT
ID
INT
Compiler Design 1 (2011)
The Role of the Parser
Phase
Input
Output
• Not all sequences of tokens are programs . . . • . . . Parser must distinguish between valid and invalid sequences of tokens
Lexer
Sequence of characters
Sequence of tokens
• We need
Parser
Sequence of tokens
Parse tree
Compiler Design 1 (2011)
6
– A language for describing valid sequences of tokens – A method for distinguishing valid from invalid sequences of tokens
7
Compiler Design 1 (2011)
8
Context-Free Grammars
CFGs (Cont.)
• Many programming language constructs have a recursive structure
• A CFG consists of – – – –
• A STMT is of the form if COND then STMT else STMT while COND do STMT …
, or , or
Assuming X ∈ N the productions are of the form X→ε , or X → Y1 Y2 ... Yn where Yi ∈ N ∪T
• Context-free grammars are a natural notation for this recursive structure
Compiler Design 1 (2011)
A set of terminals T A set of non-terminals N A start symbol S (a non-terminal) A set of productions
9
Compiler Design 1 (2011)
Notational Conventions
Examples of CFGs
• In these lecture notes
A fragment of our example language (simplified):
– Non-terminals are written upper-case – Terminals are written lower-case – The start symbol is the left-hand side of the first production
Compiler Design 1 (2011)
10
STMT → if COND then STMT else STMT ⏐ while COND do STMT ⏐ id = int
11
Compiler Design 1 (2011)
12
Examples of CFGs (cont.)
The Language of a CFG
Grammar for simple arithmetic expressions:
Read productions as replacement rules:
E→ ⏐ ⏐ ⏐
X → Y1 ... Yn
E*E E+E (E) id
Compiler Design 1 (2011)
Means X can be replaced by Y1 ... Yn
X→ε
Means X can be erased (replaced with empty string)
13
Compiler Design 1 (2011)
14
Key Idea
The Language of a CFG (Cont.)
(1) Begin with a string consisting of the start symbol “S”
More formally, we write
X1 L X i L X n → X1 L X i−1Y1 LYm X i+1 L X n
(2) Replace any non-terminal X in the string by a right-hand side of some production
if there is a production
X → Y1 LYn
X i → Y1 LYm
(3) Repeat (2) until there are no non-terminals in the string
Compiler Design 1 (2011)
15
Compiler Design 1 (2011)
16
The Language of a CFG (Cont.) Write
The Language of a CFG
∗
X 1 L X n → Y1 LYm
if
{
Let G be a context-free grammar with start symbol S. Then the language of G is:
}
∗ a1 K an | S → a1 K an and every ai is a terminal
X 1 L X n → L → L → Y1 LYm in 0 or more steps
Compiler Design 1 (2011)
17
Compiler Design 1 (2011)
18
Terminals
Examples
• Terminals are called so because there are no rules for replacing them
L(G) is the language of the CFG G Strings of balanced parentheses
• Once generated, terminals are permanent
i i
Two grammars:
• Terminals ought to be tokens of the language
Compiler Design 1 (2011)
{( ) | i ≥ 0}
S S
19
→ (S ) → ε
Compiler Design 1 (2011)
OR
S
→ (S ) | ε
20
Example
Example (Cont.)
A fragment of our example language (simplified):
Some elements of the our language
STMT → if COND then STMT ⏐ if COND then STMT else STMT ⏐ while COND do STMT ⏐ id = int COND → (id == id) ⏐ (id != id)
Compiler Design 1 (2011)
id = int if (id == id) then id = int else id = int while (id != id) do id = int while (id == id) do while (id != id) do id = int if (id != id) then if (id == id) then id = int else id = int
21
Compiler Design 1 (2011)
Arithmetic Example
Notes
Simple arithmetic expressions:
The idea of a CFG is a big step. But:
E → E+E | E ∗ E | (E) | id Some elements of the language:
id
• Membership in a language is just “yes” or “no”; we also need the parse tree of the input
id + id
(id) id ∗ id (id) ∗ id id ∗ (id) Compiler Design 1 (2011)
22
• Must handle errors gracefully • Need an implementation of CFG’s (e.g., yacc) 23
Compiler Design 1 (2011)
24
More Notes
Derivations and Parse Trees
• Form of the grammar is important
A derivation is a sequence of productions
– Many grammars generate the same language – Parsing tools are sensitive to the grammar
S →L →L →L A derivation can be drawn as a tree
Note: Tools for regular languages (e.g., lex/ML-Lex) are also sensitive to the form of the regular expression, but this is rarely a problem in practice
Compiler Design 1 (2011)
– Start symbol is the tree’s root – For a production X → Y1 LYn add children to node X
25
Derivation Example
Y1 LYn
Compiler Design 1 (2011)
26
Derivation Example (Cont.)
• Grammar
E → E+E | E ∗ E | (E) | id • String
id ∗ id + id
Compiler Design 1 (2011)
E
E → E+E → E ∗ E+E → id ∗ E + E → id ∗ id + E → id ∗ id + id 27
Compiler Design 1 (2011)
+
E E id
*
E
E id
id
28
Derivation in Detail (1)
Derivation in Detail (2)
E
E E
+
E → E+E
E
Compiler Design 1 (2011)
29
Derivation in Detail (3)
Compiler Design 1 (2011)
30
Derivation in Detail (4)
E
E → E+E → E ∗ E+E
Compiler Design 1 (2011)
E
+
E E
*
E
E → E+E
E
→ E ∗ E+E → id ∗ E + E
E
31
Compiler Design 1 (2011)
+
E E
*
E
E
id
32
Derivation in Detail (5)
Derivation in Detail (6)
E
E → E+E → E ∗ E+E → id ∗ E + E → id ∗ id + E
+
E E id
*
→ → → → →
E
E id
Compiler Design 1 (2011)
E
E
33
E+E E ∗ E+E id ∗ E + E id ∗ id + E id ∗ id + id
+
E E id
*
E
id
id
Compiler Design 1 (2011)
34
Notes on Derivations
Left-most and Right-most Derivations
• A parse tree has
• The example is a left-most
derivation
– Terminals at the leaves – Non-terminals at the interior nodes
– At each step, replace the left-most non-terminal
• An in-order traversal of the leaves is the original input
• There is an equivalent notion of a right-most
derivation
• The parse tree shows the association of operations, the input string does not
Compiler Design 1 (2011)
E
E → → → →
E+E E+id E ∗ E + id E ∗ id + id
→ id ∗ id + id 35
Compiler Design 1 (2011)
36
Right-most Derivation in Detail (1)
Right-most Derivation in Detail (2)
E
E E
+
E → E+E
E
Compiler Design 1 (2011)
37
Right-most Derivation in Detail (3)
Compiler Design 1 (2011)
38
Right-most Derivation in Detail (4)
E
E → E+E → E+id
Compiler Design 1 (2011)
E
E
+
E
E
E
→ E+E → E+id → E ∗ E + id
id
39
Compiler Design 1 (2011)
+
E E
*
E
E id
40
Right-most Derivation in Detail (5)
Right-most Derivation in Detail (6)
E
E → E+E → E+id → E ∗ E + id
+
E E
*
→ E ∗ id + id
E
E
E → → → → →
E id
id
Compiler Design 1 (2011)
41
E+E E+id E ∗ E + id E ∗ id + id id ∗ id + id
+
E E id
*
E
E id
id
Compiler Design 1 (2011)
Derivations and Parse Trees
Summary of Derivations
• Note that right-most and left-most derivations have the same parse tree
• We are not just interested in whether s ∈ L(G)
42
– We need a parse tree for s
• The difference is just in the order in which branches are added
• A derivation defines a parse tree – But one parse tree may have many derivations
• Left-most and right-most derivations are important in parser implementation
Compiler Design 1 (2011)
43
Compiler Design 1 (2011)
44
Ambiguity
Ambiguity (Cont.)
• Grammar
This string has two parse trees E → E + E | E * E | ( E ) | int E
• String int * int + int E
E +
E
E
*
E
* E
int
int
E
+
int
Compiler Design 1 (2011)
45
E
int
int
E int
Compiler Design 1 (2011)
Ambiguity (Cont.)
Dealing with Ambiguity
• A grammar is ambiguous if it has more than one parse tree for some string
• There are several ways to handle ambiguity
– Equivalently, there is more than one right-most or left-most derivation for some string
• Most direct method is to rewrite grammar unambiguously
• Ambiguity is bad – Leaves meaning of some programs ill-defined
E→T+E|T T → int * T | int | ( E )
• Ambiguity is common in programming languages – Arithmetic expressions – IF-THEN-ELSE
Compiler Design 1 (2011)
46
• Enforces precedence of * over +
47
Compiler Design 1 (2011)
48
Ambiguity: The Dangling Else
The Dangling Else: Example
• Consider the following grammar
• The expression if C1 then if C2 then S3 else S4
has two parse trees
S → if C then S | if C then S else S | OTHER
if C1
• This grammar is also ambiguous
if
C2
if
C1
S4
if S3
C2
S3
S4
• Typically we want the second form Compiler Design 1 (2011)
49
Compiler Design 1 (2011)
50
The Dangling Else: A Fix
The Dangling Else: Example Revisited
• else matches the closest unmatched then • We can describe this in the grammar
• The expression if C1 then if C2 then S3 else S4
S → MIF | UIF
C1
MIF → if C then MIF else MIF | OTHER UIF → if C then S | if C then MIF else UIF
if C2
S3
C1 S4
• A valid parse tree (for a UIF)
• Describes the same set of strings Compiler Design 1 (2011)
if
if
/* all then are matched */ /* some then are unmatched */
51
Compiler Design 1 (2011)
S4
if C2
S3
• Not valid because the then expression is not a MIF 52
Ambiguity
Precedence and Associativity Declarations
• No general techniques for handling ambiguity
• Instead of rewriting the grammar – Use the more natural (ambiguous) grammar – Along with disambiguating declarations
• Impossible to convert automatically an ambiguous grammar to an unambiguous one
• Most tools allow precedence and associativity declarations to disambiguate grammars
• Used with care, ambiguity can simplify the grammar
• Examples …
– Sometimes allows more natural definitions – We need disambiguation mechanisms
Compiler Design 1 (2011)
53
Compiler Design 1 (2011)
54
Associativity Declarations
Precedence Declarations
• Consider the grammar E → E + E | int • Ambiguous: two parse trees of int + int + int
• Consider the grammar E → E + E | E * E | int
E E E
+
+ E
int
int
– And the string int + int * int E
E E
E
int
int E
+
int
E
E E
+ E
int
E
E
int
int E
+
int
• Precedence declarations: %left + %left *
• Left associativity declaration: %left + Compiler Design 1 (2011)
+ E
int
int
*
E
55
Compiler Design 1 (2011)
E *
E int
56
Error Handling
Syntax Error Handling
• Purpose of the compiler is
• Error handler should
– To detect non-valid programs – To translate the valid ones
– Report errors accurately and clearly – Recover from an error quickly – Not slow down compilation of valid code
• Many kinds of possible errors (e.g. in C) Error kind Lexical Syntax Semantic Correctness
Example
Detected by …
…$… … x *% … … int x; y = x(3); … your favorite program
• Good error handling is not easy to achieve
Lexer Parser Type checker Tester/User
Compiler Design 1 (2011)
57
Compiler Design 1 (2011)
Approaches to Syntax Error Recovery
Error Recovery: Panic Mode
• From simple to complex
• Simplest, most popular method
– Panic mode – Error productions – Automatic local or global correction
58
• When an error is detected: – Discard tokens until one with a clear role is found – Continue from there
• Not all are supported by all parser generators
• Such tokens are called synchronizing tokens – Typically the statement or expression terminators
Compiler Design 1 (2011)
59
Compiler Design 1 (2011)
60
Syntax Error Recovery: Panic Mode (Cont.)
Syntax Error Recovery: Error Productions
• Consider the erroneous expression
• Idea: specify in the grammar known common mistakes • Essentially promotes common errors to alternative syntax • Example:
(1 + + 2) + 3
• Panic-mode recovery: – Skip ahead to next integer and then continue
– Write 5 x instead of 5 * x – Add the production E → … | E E
• (ML)-Yacc: use the special terminal error to describe how much input to skip
• Disadvantage
E → int | E + E | ( E ) | error int | ( error )
– Complicates the grammar
Compiler Design 1 (2011)
61
Syntax Error Recovery: Past and Present • Past – Slow recompilation cycle (even once a day) – Find as many errors in one cycle as possible – Researchers could not let go of the topic
• Present – – – –
Quick recompilation cycle Users tend to correct one error/cycle Complex error recovery is needed less Panic-mode seems enough