Ambiguity Detection: Scaling to Scannerless - Google Sites

Ambiguity Detection: Scaling to Scannerless Bas Basten Paul Klint Jurgen Vinju Centrum Wiskunde & Informatica Amsterdam, The Netherlands

Motivation: Scannerless Generalized Parsing ●

No separate scanner/tokenizer

●

Modular grammar definitions

●

Enables parsing of: – –

●

Legacy languages Language embeddings

Problem: possible ambiguity!

Parsing (Legacy) Languages ●

PL/I:

IF IF = THEN THEN IF = ENDIF; ENDIF;


PL/I:

IF IF = THEN THEN IF = ENDIF; ENDIF; ●

Pascal:

a : array [1..10] of Integer


PL/I:

IF IF = THEN THEN IF = ENDIF; ENDIF; ●

Pascal:

a : array [1..10] of Integer ●

C++:

List setList;

Language Embeddings ●

Embedding AspectJ into Java*

●

Problem: different reserved keywords

* Bravenboer, Tanter, Visser – OOPSLA 2006



●


class Screen { private float width, height; public float aspect() { return width / height; } }

Java

* Bravenboer, Tanter, Visser – OOPSLA 2006



●


class Screen { private float width, height; Java public float aspect() { return width / height; } AspectJ } aspect MyAspect { pointcut aspectCall(): target(Screen) && call(float aspect()); ... } * Bravenboer, Tanter, Visser – OOPSLA 2006

Character-level grammars ●

EBNF

●

Include lexical definitions (no tokens)

●

Terminals are character-classes ([az], [09])

●

Disambiguation filters: –

Follow restrictions (longest match) ●

–

Identifier / [az]

Rejects (keyword reservation) ●

Identifier → ”else” {reject}

Ambiguity Detection ●

Undecidable in general

●

Trade-off: performance/termination ↔ accuracy

●

Ambiguity detection methods: –

Approximative

–

Exhaustive

Research Question ●

Previous work: AmbiDexter –

Harmless production filtering

–

Significant speed-ups (LDTA 2010)

–

Proved correct (ICTAC 2010)

Research Question ●

●

Previous work: AmbiDexter –

Harmless production filtering

–

Significant speed-ups (LDTA 2010)

–

Proved correct (ICTAC 2010)

Applicable to character-level grammars? –

More complex: full definition of lexical syntax

–

Less deterministic: no heuristics of scanner

–

Disambiguation filters

AmbiDexter Filter & Reduce ”Unambiguous” Grammar

Non-deterministic Finite Automaton

”Ambiguous”

? Sentence Generator

? Time-out

●

NFA describes overapproximation of parse trees

●

Smaller NFA = less sentences

●

Goal: find ambiguous strings faster

NFA Describes Parse Trees ●

Parse trees: Exp

Exp Exp Exp + Exp ●

Exp *

Exp

Exp + Exp

*

Exp

Bracketed strings:

(2 (1 Exp + Exp )1 * Exp )2

(1 Exp + (2 Exp * Exp )2 )1

Example NFA

(2 (1 Exp + Exp )1 * Exp )2

(1 Exp + (2 Exp * Exp )2 )1

Extensions to baseline algorithm ●

●

Modifications to NFA for: –

Character classes

(replace tokens)

–

Follow restrictions

(propagation)

–

Rejects

(language difference)

–

Priority/Associativity

(derivation restriction)

General improvement: –

Grammar unfolding ●

Often used non-terminals (whitespace)

Experiment setup ●

Grammar test set: Grammar Oberon0 C ECMAScript SQL-92 Java 1.5 C++

●

Productions 189 324 403 419 698 807

Disambiguation annotations 190 374 53 58 431 162

Measurements: –

NFA Filtering (time, memory, edges filtered)

–

Sentence generation time before & after filtering

Measurement results (small grammar) Oberon0 35000

Filtering time: 14s Edges filtered: 53%

30000 25000 20000 15000 10000 5000 0

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

Unfiltered Filtered (including filtering time)

Measurement results (medium sized grammars)

80000

C

ECMAScript

SQL-92




60000

12000

50000

10000

40000

8000

30000

6000

20000

4000

10000

2000

0

0

60000

40000

20000

0

4

5

6

1

5

2

7

8

9

1

1

3

4

5

6

7

3

2

2

1

1

10

1

11

12

1

13

14

15

1


Measurement results (large grammars) Java 1.5

C++

Filtering time: 28m Memory: 16Gb

Filtering time: >2h40m Memory: >17GB Too ambiguous!

40000

30000

20000

10000

0

6

7

8

1

9

10


Summary Grammar

Oberon0 C ECMAScript SQL-92 Java 1.5

Break-even time

Maximum speedup

15s 6m 2m 1m 35m

3399x 1.7x 2.7x 15x 5.3x

Ambiguous non-terminals found faster * 0 2 4 3 0 * Average time limit: 10hrs

Conclusions ●

Ambiguity detection for character-level grammars

●

Staged approach: –

NFA filtering

–

Sentence generation

●

Experimental evaluation

●

Significant speedups

●

Next step: integration with Rascal

Ambiguity Detection: Scaling to Scannerless - Google Sites

Ambiguity Detection: Scaling to Scannerless - Google Sites

Suggest Documents

Ambiguity Detection: Scaling to Scannerless - Google Sites

Measurable Ambiguity - Google Sites

Leveraging Preposition Ambiguity to Assess ... - Google Sites

Ambiguity Detection Methods for Context-Free Grammars - Google Sites

Scannerless Generalized-LR Parsing

Faster Scannerless GLR Parsing

Scannerless Generalized-LR Parsing

AMBIDEXTER: Practical Ambiguity Detection Tool Demonstration

Ambiguity Detection Methods for Context-Free Grammars

An E perimental Ambiguity Detection Tool

Scaling up Copy Detection - arXiv

From Ambiguity to Action

Active Resonant Subwavelength Grating for Scannerless Range ...

Group Decision Making Under Ambiguity - Google Sites

Electoral Ambiguity and Political Representation - Google Sites

Simultaneous Position Estimation & Ambiguity ... - Google Sites

Ambiguity and Second$Order Belief - Google Sites

Ambiguity and rational expectations equilibria - Google Sites

Symmetry Axioms and Perceived Ambiguity - Google Sites

The stochastic background: scaling laws and time to detection for ...

[PDF] Scaling Up - Google Sites

Scaling Up Excellence - Google Sites

Scaling Deterministic Multithreading - Google Sites

[PDF] Scaling Up - Google Sites