The Usability of Ambiguity Detection Methods for Context-Free ...

1 downloads 219 Views 149KB Size Report
Problem: Context-free grammars can be ambiguous ... Overview. 1. Ambiguity in Context-Free Grammars. 2. .... Architectur
The Usability of Ambiguity Detection Methods for Context-Free Grammars LDTA 2008

Bas Basten CWI, Amsterdam

Motivation 

Problems:  



Solution: 



Use unconstrained context-free grammars

Problem: 



Writing LL/LR grammars is hard LL/LR grammars cannot be composed modularly

Context-free grammars can be ambiguous

Potential solution: 

Use ambiguity detection?

LDTA 2008

2

Overview 1. Ambiguity in Context-Free Grammars 2. Investigated Ambiguity Detection Methods 3. Comparison 4. Results 5. Conclusion 6. Future Directions

LDTA 2008

3

Background: Ambiguity in CFGs 



A grammar G is ambiguous iff L(G) contains a string with multiple derivations Simple expression grammar:   

E→E+E E→E*E E → 0 | 1 | 2 ...

E

E

E

E E

E

1 + 2 * 3 1 + (2 * 3) = 7

LDTA 2008

E

E E

E

1 + 2 * 3 (1 + 2) * 3 = 9

4

Background: Ambiguity in CFGs 

Ambiguity problem: is grammar G ambiguous?  

undecidable in general semidecidable 



Generating all strings of L(G) + checking for ambiguity only terminates if G is ambiguous

LR(k) is a subclass of the unambiguous grammars

LDTA 2008

5

Ambiguity Detection 

An ambiguity detection method (ADM) should:   



Correctly answer “ambiguous” or “unambiguous” Terminate in acceptable time Give information for disambiguation

Perfect ADM cannot exist  

Correctness/termination trade-off Practical value?

LDTA 2008

6

Investigated ADMs 

LR(k) test  



Noncanonical Unambiguity (NU) test  



D.E. Knuth, 1965 LR(k) parse table generation for increasing k S. Schmitz, 2006 Conservative approximation of parse automaton

AMBER  

W.F. Schröer, 2001 Derivation generator

LDTA 2008

Which one is the most practically usable? 7

Noncanonical Unambiguity test 

Builds NFA that approximates parse automaton of G 



Searches NFA for ambiguous strings 



Finite search space

Conservative approximation  



L(NFA) superset of L(G)

No ambiguities left out Reports “unambiguous” and “potentially ambiguous”

Different precisions: LR(0), SLR(1), LR(1) 

Larger automaton → higher accuracy

LDTA 2008

8

Empirical Comparison 

2 grammar collections: 

84 Toy grammars  



3-17 productions

48 ambiguous 36 unambiguous

5 Real world grammars:     

LDTA 2008

HTML 29 productions SQL 79 productions Pascal 176 productions C 212 productions Java 349 productions  Of each: 1 unambiguous + 5 ambiguous versions

9

Empirical Comparison 

Accuracy  



Performance  



Percentage of correct reports Toy grammars Computation time, memory consumption Real world grammars

Termination   

Ability to terminate in given amount of time Real world grammars Time limits: 5 min, 15 hrs.

LDTA 2008

10

Results LR(k) test 

Advantages: 



100% accurate

Disadvantages:    

Only reports “unambiguous” for LR(k) grammars Nontermination on non LR(k) grammars Conflicts in parse tables are incomprehensible Exponential performance 

Max values of k testable in 15 hrs: 2-6 (ambig. real world gr.)

Hardly usable for unconstrained CFGs, only for LR(k) LDTA 2008

11

Results NU test 



Accuracy: (unambiguous toy grammars) SLR(1)

LR(1)

61%

69%

86%

Performance:  



LR(0)

LR(0), SLR(1): Very fast, all tests < 3 sec. LR(1) also, but too much memory on C and Java grammars: swapping or crashing

Incomprehensible reports LR(1) precision pretty useful for |G| < 200 productions

LDTA 2008

12

Results AMBER 

Advantages:   

100% accurate Ambiguous example strings are very useful High termination scores: (ambiguous real world gr.) 



70% in 5 min, 90% in 15 hrs.

Disadvantages:  

Only reports “ambiguous” Runs forever on unambiguous grammars

Very useful for ambiguous grammars LDTA 2008

13

Conclusion 

Usability ranking on grammar collections: 1. AMBER 

Very useful for ambiguous grammars

2. Noncanonical Unambiguity test 

LR(1) precision pretty useful for |G| < 200 productions

3. LR(k) test 

Hardly usable for unconstrained CFGs, only for LR(k)

LDTA 2008

14

Future Directions 

Compare other ADMs, for instance:



Grambiguity  



C. Brabrand, R. Giegerich and A. Møller, 2006 Regular (conservative) approximation

CFGAnalyzer  

R. Axelsson, K. Heljanko and M. Lange, 2007 Incremental SAT solving

LDTA 2008

15

Future Directions 

Iterative approach   



Multiple checks with increasing detail Filter out unambiguous grammar parts Coarse grained, fast → ... → fine grained, slow

For example: 

Run NU test first:  



“unambiguous” → done “potentially ambiguous” → try filtering unambiguous parts

Run derivation generator (AMBER) on remainder 

LDTA 2008

Smaller search space, better performance

16

References 

  





F. W. Schröer. AMBER, an ambiguity checker for context-free grammars. Technical report, Fraunhofer Institute for Computer Architecture and Software Technology, 2001. http://accent.compilertools.net/Amber.html D. E. Knuth. On the translation of languages from left to right. Information and Control, 8(6):607–639, 1965. V. Makarov. MSTA (syntax description translator). May 1995. http://cocom.sourceforge.net/msta.html S. Schmitz. An experimental ambiguity detection tool. In A. Sloane and A. Johnstone, editors, Seventh Workshop on Language Descriptions, Tools, and Applications (LDTA '07), Braga, Portugal, March 2007. C. Brabrand, R. Giegerich, and A. Møller. Analyzing ambiguity of context-free grammars. In M. Balík and J. Holub, editors, 12th International Conference on Implementation and Application of Automata (CIAA '07), July 2007. R. Axelsson, K. Heljanko, M. Lange. CFGAnalyzer. 2007. http:// www.tcs.ifi.lmu.de/~mlange/cfganalyzer/

LDTA 2008

17

Suggest Documents