Problem: Context-free grammars can be ambiguous ... Overview. 1. Ambiguity in Context-Free Grammars. 2. .... Architectur
The Usability of Ambiguity Detection Methods for Context-Free Grammars LDTA 2008
Bas Basten CWI, Amsterdam
Motivation
Problems:
Solution:
Use unconstrained context-free grammars
Problem:
Writing LL/LR grammars is hard LL/LR grammars cannot be composed modularly
Context-free grammars can be ambiguous
Potential solution:
Use ambiguity detection?
LDTA 2008
2
Overview 1. Ambiguity in Context-Free Grammars 2. Investigated Ambiguity Detection Methods 3. Comparison 4. Results 5. Conclusion 6. Future Directions
LDTA 2008
3
Background: Ambiguity in CFGs
A grammar G is ambiguous iff L(G) contains a string with multiple derivations Simple expression grammar:
E→E+E E→E*E E → 0 | 1 | 2 ...
E
E
E
E E
E
1 + 2 * 3 1 + (2 * 3) = 7
LDTA 2008
E
E E
E
1 + 2 * 3 (1 + 2) * 3 = 9
4
Background: Ambiguity in CFGs
Ambiguity problem: is grammar G ambiguous?
undecidable in general semidecidable
Generating all strings of L(G) + checking for ambiguity only terminates if G is ambiguous
LR(k) is a subclass of the unambiguous grammars
LDTA 2008
5
Ambiguity Detection
An ambiguity detection method (ADM) should:
Correctly answer “ambiguous” or “unambiguous” Terminate in acceptable time Give information for disambiguation
Perfect ADM cannot exist
Correctness/termination trade-off Practical value?
LDTA 2008
6
Investigated ADMs
LR(k) test
Noncanonical Unambiguity (NU) test
D.E. Knuth, 1965 LR(k) parse table generation for increasing k S. Schmitz, 2006 Conservative approximation of parse automaton
AMBER
W.F. Schröer, 2001 Derivation generator
LDTA 2008
Which one is the most practically usable? 7
Noncanonical Unambiguity test
Builds NFA that approximates parse automaton of G
Searches NFA for ambiguous strings
Finite search space
Conservative approximation
L(NFA) superset of L(G)
No ambiguities left out Reports “unambiguous” and “potentially ambiguous”
Different precisions: LR(0), SLR(1), LR(1)
Larger automaton → higher accuracy
LDTA 2008
8
Empirical Comparison
2 grammar collections:
84 Toy grammars
3-17 productions
48 ambiguous 36 unambiguous
5 Real world grammars:
LDTA 2008
HTML 29 productions SQL 79 productions Pascal 176 productions C 212 productions Java 349 productions Of each: 1 unambiguous + 5 ambiguous versions
9
Empirical Comparison
Accuracy
Performance
Percentage of correct reports Toy grammars Computation time, memory consumption Real world grammars
Termination
Ability to terminate in given amount of time Real world grammars Time limits: 5 min, 15 hrs.
LDTA 2008
10
Results LR(k) test
Advantages:
100% accurate
Disadvantages:
Only reports “unambiguous” for LR(k) grammars Nontermination on non LR(k) grammars Conflicts in parse tables are incomprehensible Exponential performance
Max values of k testable in 15 hrs: 2-6 (ambig. real world gr.)
Hardly usable for unconstrained CFGs, only for LR(k) LDTA 2008
11
Results NU test
Accuracy: (unambiguous toy grammars) SLR(1)
LR(1)
61%
69%
86%
Performance:
LR(0)
LR(0), SLR(1): Very fast, all tests < 3 sec. LR(1) also, but too much memory on C and Java grammars: swapping or crashing
Incomprehensible reports LR(1) precision pretty useful for |G| < 200 productions
LDTA 2008
12
Results AMBER
Advantages:
100% accurate Ambiguous example strings are very useful High termination scores: (ambiguous real world gr.)
70% in 5 min, 90% in 15 hrs.
Disadvantages:
Only reports “ambiguous” Runs forever on unambiguous grammars
Very useful for ambiguous grammars LDTA 2008
13
Conclusion
Usability ranking on grammar collections: 1. AMBER
Very useful for ambiguous grammars
2. Noncanonical Unambiguity test
LR(1) precision pretty useful for |G| < 200 productions
3. LR(k) test
Hardly usable for unconstrained CFGs, only for LR(k)
LDTA 2008
14
Future Directions
Compare other ADMs, for instance:
Grambiguity
C. Brabrand, R. Giegerich and A. Møller, 2006 Regular (conservative) approximation
CFGAnalyzer
R. Axelsson, K. Heljanko and M. Lange, 2007 Incremental SAT solving
LDTA 2008
15
Future Directions
Iterative approach
Multiple checks with increasing detail Filter out unambiguous grammar parts Coarse grained, fast → ... → fine grained, slow
For example:
Run NU test first:
“unambiguous” → done “potentially ambiguous” → try filtering unambiguous parts
Run derivation generator (AMBER) on remainder
LDTA 2008
Smaller search space, better performance
16
References
F. W. Schröer. AMBER, an ambiguity checker for context-free grammars. Technical report, Fraunhofer Institute for Computer Architecture and Software Technology, 2001. http://accent.compilertools.net/Amber.html D. E. Knuth. On the translation of languages from left to right. Information and Control, 8(6):607–639, 1965. V. Makarov. MSTA (syntax description translator). May 1995. http://cocom.sourceforge.net/msta.html S. Schmitz. An experimental ambiguity detection tool. In A. Sloane and A. Johnstone, editors, Seventh Workshop on Language Descriptions, Tools, and Applications (LDTA '07), Braga, Portugal, March 2007. C. Brabrand, R. Giegerich, and A. Møller. Analyzing ambiguity of context-free grammars. In M. Balík and J. Holub, editors, 12th International Conference on Implementation and Application of Automata (CIAA '07), July 2007. R. Axelsson, K. Heljanko, M. Lange. CFGAnalyzer. 2007. http:// www.tcs.ifi.lmu.de/~mlange/cfganalyzer/
LDTA 2008
17