Using Linear Positional Grammars for the LR Parsing of 2-D Symbolic ...

Using Linear Positional Grammars for the LR Parsing of 2-D Symbolic Languages GENNARO COSTAGLIOLA∗ AND SHI-KUO CHANG∗∗ ∗

Dipartimento di Matematica ed Informatica Università di Salerno - 84081 Baronissi (SA) -ITALY Phone: +39089965273 Fax:+39089965438 {[email protected]} ∗∗

Department of Computer Science University of Pittsburgh - Pittsburgh, PA 15260 –U.S.A. Phone: 412 6248423 Fax: 412 8465 {[email protected]}

Abstract In this paper we present a grammar formalism for the generation and parsing of two-dimensional symbolic languages. Linear Positional Grammars (or LPGs for short) are an immediate generalization of the context-free string grammars. Through the use of general spatial relations they allow the definition of pictures whose symbols span on a two-dimensional space. Due to their analogy to context-free string grammars, LPGs can be used to construct an LR-based parser which uses the spatial relations to navigate the input. We study ambiguous grammars and present several ways to solve them. Moreover we provide an algorithm to translate a linear positional grammar into a contextfree grammar with actions and suggest a general methodology to parse twodimensional symbolic languages by making use of the well-known tool YACC (Yet Another Compiler-Compiler [25]). As an example, we construct a parser for a subset of the two-dimensional arithmetical expression language. Keywords: Arithmetic expressions, LR parsing, positional grammars, spatial relations, two-dimensional languages

1. Introduction In the last years many researches have been devoted to the description and parsing of multidimensional languages, i.e., languages whose sentences may be represented as graphs, box-and-arrows diagrams, set of icons on a 2-D plane, puzzles, symbolic matrices, etc. Pfalz and Rosenberg have extended the concept of string grammar to grammars for labeled graphs called webs [21, 22, 32, 33, 36]. These grammars were originally suggested as a syntactical formalism for data structures useful in image analysis. An application of graph languages for describing scenes is of frequent occurrence in the literature dealing with image processing, whereas the use of graph grammars for pattern recognition was rare at the beginnings (for this purpose tree grammars were applied instead [3, 22, 23, 27, 37, 39]). Recently, however, parsing methods for particular classes of graph grammars have been proposed, and efficient parsing, close to the parsing efficiency of tree languages, has been obtained [20, 26, 40, 45, 19, 34]. Based on an idea in the work of Narasimhan [30], Feder [18] has formalized a "plex" grammar which generates languages with terminals having an arbitrary number of attaching points in order to connect to other primitives or sub-patterns. The primitives of the plex grammar are called N-Attaching Point Entities (NAPEs). Plex structures defined by a plex grammar may be viewed as a hypergraph, with each NAPE corresponding to a hyperedge. Among the parsing methods developed for plex languages, in [31] it is presented a parsing scheme for general plex grammars which adapts the Earley parsing algorithm, [17], while in [13] it has been devised a parser for deterministic plex languages based on an extension of the positional grammars presented in this paper and the LR parsing methodology. In this last case it has been shown that a significative set of plex languages like, for example, flowcharts can be parsed deterministically in almost linear time. Other recent important contributes to the description and parsing of complex multidimensional languages, non necessarily seen as graph or plex languages, include the works on Relational Grammars [47], Picture Layout Grammars [24], and Constraint Multiset Grammars [11]. A good reference to deepen the knowledge on these and other approaches can be found in [28]. In this paper our study of multidimensional languages is limited to that of symbolic twodimensional languages. Although all the previous approaches can be easily adapted to handle this type of languages, both their grammar and parsing models do not take into account and, then, do not take advantages of the limits that symbolic two-dimensional languages impose on general multidimensional languages. In fact, the nature of these languages suggests the use of traditional string grammars enhanced with positional information for the symbols. For example, one of the first approaches is given by a traditional string grammar in which more general relations (HOR, VER, ABOVE, LEFT, etc.), other than concatenation, are allowed among primitives in the pattern [2, 12, 21]. Shaw, by attaching a "head" and a "tail" to each primitive, has used four binary operators for defining binary concatenation relations between primitives. A context-free string grammar is used to generate the resulting Picture Description Language (PDL) [21, 38]. Another interesting approach using a string grammar, has been given in [5] where each primitive has associated spatial attributes.

-2-

A simple two-dimensional generalization of string grammars is to extend grammars for one-dimensional strings to two-dimensional arrays [29, 35, 41, 42, 46]. The primitives are the array elements and the relation between primitives is the two-dimensional concatenation. In this case, the parsing problem is also somewhat simplified since traditional techniques for string languages can be more easily adapted. One of the last approaches is based on what we call syntax-directed scanning, where the scanning of the pattern during the parsing process is directly driven by the syntax. The position of the next input symbol to parse depends on the positional specifications built into the grammar and passed to the parser. The work done by Tomita in [44] is mostly based on these ideas. There, a 2-D Chomsky Normal Form grammar is presented and extensions to the two-dimensional case of Earley's and LR parsing algorithms for it are given. In this paper, we propose a grammar formalism for the representation of two-dimensional symbolic languages. This grammar model, named Linear Positional Grammar or LPG for short, is an extension of the context-free grammar model where positional relations other than string concatenation are explicitly inserted between the elements (terminals and non-terminals) of a production rule. Since these relations can be very general, the resulting grammar can be seen as a generalization of Tomita's 2-D Chomsky Normal Form grammar where only horizontal and vertical relations are allowed. We characterize the class of linear pictorial languages and define LR-based parsers by allowing a traditional LR parser to choose the position of the next input symbol to parse. We show how to construct a Simple positional LR(1) parser (SpLR(1) parser for short) and then give indications on how to build positional LR(1) and LALR(1) parsers (pLR(1) and pLALR(1) parsers for short, respectively). The parsers are constructed by adding a new column to the conventional LR parsing table format. This column will contain the positional information to retrieve the next symbol to be analyzed, for each state. Unlikely from the 2-D LR parsing algorithms given in [44], our parsers slightly modifies the original LR parsing methodology, so that the tool YACC, [25], can be easily adapted to construct a parser for a two-dimensional language described by an LPG. To show this, we prove that is always possible to implement an SpLR(1) linear positional grammar by a context-free grammar with actions. Finally, we analyze cases of ambiguity, give some ways to avoid them and then present the general methodology to parse symbolic pictures generated by a linear positional grammar, and apply it to the case of a subset of the two-dimensional arithmetical expression language. The paper is organized as follows. In Section 2 we define the LPG formalism, and give some examples. Section 3 contains the characterization of the linear pictorial languages and shows how to construct a grammar for a linear pictorial language. In Section 4 we present the pLR(1) parsing driver algorithm, while in Section 5 we give algorithms to construct the Simple pLR(1) parsing table, and give indications on how to build the pLR(1) and pLALR(1) parsing tables. In Section 6 we analyze some non pLR(1) grammars and construct an SpLR(1) parsing table for a two-dimensional arithmetical expression grammar. In Section 7 we prove that it is always possible to give a YACC specification for an SpLR(1) linear positional grammar and

-3-

present a general methodology for the implementation of such grammars. The conclusions and notes on further research are given in Section 8.

2. Positional Grammars Definition 1. (Positional Grammar) A context-free positional grammar PG can be represented by a six-tuple (N, T, S, P, POS, PE) where: N is a finite non-empty set of non-terminal symbols, T is a finite non-empty set of terminal symbols, N ∩ T = ∅, S ∈ N is the starting symbol, P is a finite set of productions POS is a finite set of positional relation identifiers POS ∩ (N ∪ T) = ∅, PE is a positional evaluator Each production in P has the following form: A → x1 REL1 x2 REL2 ... RELm-1 xm where A ∈ N, each xi is in N ∪ T and each RELi is in POS.

m≥ 1 ¡

Each positional relation RELi gives information about the relative position of xi+1 with respect to xi. In the following, the words “positional grammar” will always refer to a context-free positional grammar. While in a string grammar the only possible positional relation is the string concatenation, in a positional grammar other positional relations can be defined and then used for describing high dimensional languages. When parsing, this positional information will be used by the scanner to select the next symbol to parse. We will show that each “word” generated by a positional grammar is a symbolic picture on some vocabulary. The following is an extension of the definition of symbolic picture given in [9]. Definition 2. (Symbolic Picture) Let V be a set of symbols, n > 0 an integer and P a set of positions in the n-dimensional space. A symbolic picture p of dimension n on V and P is a mapping p : P → V. When |P| = 1 the symbolic picture reduces to one symbol. ¡

-4-

Informally, a symbolic picture is a spatial arrangement of one or more symbols from a given vocabulary. For the definition of the positional relations we will often make use of the concept of location of a symbolic picture. Definition 3. (Location of a Picture) Let S be the set of all the symbolic pictures of dimension n on a vocabulary V and a set of positions P. The location of a symbolic picture p ∈ S is the mapping location : S → P. ¡ Informally, given a symbolic picture p, location(p) is a function returning the position held by a symbol of p. (Note that a symbol is always associated to one position.) As an example, the centroid of a picture is a good candidate for location. In the rest of the paper we will consider two-dimensional symbolic pictures on some vocabulary and on the set of Cartesian coordinates. Some simple examples of positional relations on a Cartesian plane follow: - String concatenation or adjacent horizontal concatenation AHOR = {(p1, p2) : p1 and p2 are pictures horizontally concatenated with alignment of their centroids} - Adjacent vertical concatenation AVER = {(p1, p2) : p1 and p2 are pictures vertically concatenated with alignment of their centroids} - Upper horizontal concatenation UHOR = {(p1, p2) : p1 and p2 are pictures horizontally concatenated with alignment of the centroid of p1 and the up-most element of p2} - Overlapping OVER = {(p1, p2) : p1 and p2 are pictures whose centroids are overlapping} - Adjacency at distance δ ADJδ = {(p1, p2) : p1 and p2 are pictures and location(p1) = (x, y) and location(p2) = (x′, y′) and (x′ - x)2 + (y′ - y)2 = δ2} Other examples of positional relations for symbolic pictures can be found in [7] and [8]. Definition 4. (Positional Evaluator) A positional evaluator PE is a function whose input is a sequence p1 REL1 p2 REL2 ... RELm-1 pm m≥ 1

-5-

where each pi is a symbolic picture and each RELi is a positional relation; its output is a new picture constituted by the pictures p1, p2, ..., pm disposed in the space such that (pi, pi+1) ∈ RELi 1 ≤ i ≤ m - 1. The evaluation of the positional relations is meant to be sequential from left to right. ¡ For practical purposes the function PE will be represented by enclosing the string input between a left bracket and a right bracket, respectively. Some examples of simple positional evaluator applications follow: PE(“a”)

= {a}

=a

PE(“a . b . c . d”)

= {a . b . c . d}

=a b c d

PE(“a AVER b AHOR c”) = {a AVER b AHOR c}

PE(“a OVER _”)

= {a OVER _}

=a b

c

=a

where the positional relation ‘.’ is the string concatenation spatial relation equivalent to AHOR; AVER and OVER are defined as above. The following definitions are understood to be with respect to a particular positional grammar PG. Definition 5. We write Π ⇒ Σ if there exist ∆, Γ, A, ψ such that Π = ΓA∆, A → ψ is a production and Γ{ψ}∆ if | ψ | > 1 Σ= if | ψ | = 1 Γ ψ∆ Here ‘{ψ }’ is considered as a string and not as its value; Γ and ∆ may not be balanced, i.e., the number of left brackets may not be equal to the number of right brackets. ¡ Definition 6. We write Π ⇒* Σ (Σ is derived from Π) if there exist strings Π0, Π1 ... Πm (m ≥ 0) such that Π = Π0 ⇒ Π1 ⇒ ... ⇒ Πm = Σ The sequence Π0, ..., Πm is called a derivation of Σ from Π. A positional sentential form is a string Π such that S ⇒* Π. A positional sentence is a positional sentential form not containing non-terminal symbols. A pictorial form is the evaluation of a positional sentential form. A picture is a pictorial form with only terminal symbols. The pictorial language defined by a positional grammar L(PG)is the set of its pictures. ¡

-6-

Note that the pictorial forms and the pictures of a language L(PG) are symbolic pictures on the vocabularies N ∪ T and T, respectively. As terminology, where not defined, we use capitals for non-terminals: A, B, C; lower-case letters and words for terminals: a, b, c; lower-case letters close to the end of the alphabet for symbols representing either terminals or non-terminals: x, y; and capital identifiers for positional relations: AHOR, AVER, OVER. Strings of either terminals or non-terminals alternated with positional relation identifiers are represented by lower-case Greek letters α, β, ψ . Positional sentential forms are represented by Greek capitals Π, ∆, Γ. Example 1. The following grammar generates the strings of the form a...ab...b with equal number of a's and b's. N T POS P

= {S} = {a, b} = {.} ={S→a.S.b|a.b

}

The positional operator ‘.’ is the string concatenation spatial relation. A positional sentence of this grammar is: {a . {a . {a . b} . b} . b} and the corresponding picture is: aaabbb. This example shows that every context-free string language can be represented by a positional grammar. Example 2. The following grammar generates an upper-right corner with variable length of the edges. N T S POS

= {Corner, HLine, VLine} = {dot} = Corner = {UHOR, AHOR, AVER}

P

= { Corner → HLine UHOR VLine HLine → HLine AHOR dot | dot VLine → VLine AVER dot | dot

-7-

}

where UHOR, AHOR and AVER are defined as above. A positional sentence of this grammar is: {{{{dot AHOR dot} AHOR dot} AHOR dot} UHOR {{{dot AVER dot} AVER dot} AVER dot}} By evaluating it and replacing dot with the graphical sign ‘.’, we obtain the image in Figure 1. . . . . . . . . Figure 1. An upper-right corner Example 3. The following grammar generates a generic line. N T S POS P

= {Line} = {dot} = Line = {ADJ δ} = { Line → Line ADJδ dot | dot }

where ADJ δ is defined as above with location(Line) returning the coordinates of the last dot drawn in Line seen as a picture. Replacing dot with the graphical sign ‘.’ and for a very little δ, this grammar can be used to generate the image in Figure 2.

Figure 2. Four examples of lines. New positional relations can be defined making restrictions on the choice of the position on the circumference described by ADJ δ. Such definitions make it possible for a positional grammar to generate many kinds of curve on a Cartesian plane with some property.

-8-

Example 4. The following grammar generates two-dimensional arithmetical expressions using the binary operations addition and division: N S T POS P

= {E, T, F} =E = {+, hbar, (, ), id} = {AHOR, AVER} ={ E → E AHOR + AHOR T | T T → T AVER hbar AVER F | F F → ( AHOR E AHOR ) | id

}

A positional sentence of this grammar is: {{id AHOR + AHOR {{ ( AHOR {id AHOR + AHOR id} AHOR) } AVER hbar AVER id}} AHOR + AHOR id } Replacing hbar with an horizontal bar, the corresponding picture is: id +

(id + id ) + id id

3. Linear Positional Grammars Before defining a linear positional grammar we need to define the HEAD and TAIL of a picture ps obtained by evaluating a positional sentence s. Definition 7. (HEAD and TAIL) Let ps be the picture resulting from the evaluation of a positional sentence s, we define HEAD(ps ) as the symbol in ps which begins s, and TAIL(ps ) as the symbol in ps which ends s. ¡ For example, if s is the positional sentence {a AVER b AHOR c} where a and c begin and end s, respectively, then ps

=

a b

c

where the circle and the square mark the HEAD(ps ) and TAIL(ps ) symbols, respectively. The following definition characterizes the class of the linear pictorial languages.

-9-

Definition 8. (Linear Positional Grammar) Let ps be the picture resulting from the evaluation of the positional sentence s. A positional grammar LPG is linear iff the occurrence of “x REL y” in a production of LPG implies that for each positional sentence s derivable from x and for each positional sentence t derivable from y, with x, y ∈ N ∪ T, (TAIL(ps ), HEAD(pt)) ∈ REL in the picture obtained from the execution of {ps REL pt}. ¡ We say that a language is a linear pictorial language iff it is generated by a linear positional grammar. A direct consequence of this definition is that for each positional sentence s of a linear positional grammar, the execution of s is equal to the execution of {v}, where v is obtained from s by omitting all the brackets. This explains the adjective linear. Example 5. The grammar of Example 1. is linear. In particular, the evaluation of the positional sentence {a . {a . {a . b} . b} . b} is equal to the evaluation of {a . a . a . b . b . b}. As the condition of Definition 8. holds for each context-free string grammar (when the positional relation ‘.’ is made explicit), then the class of the string context-free languages is a subclass of the class of the linear pictorial languages. Example 6. Both the grammars of Examples 2. and 3. are linear. (Note that the positional relation UHOR is equal to AHOR when applied to terminals). Example 7. The arithmetical expression grammar of Example 4. in not linear. To show this, let us consider the following positional sentence: {{id1 AHOR +1 AHOR { id2 AVER hbar AVER id3}} AHOR +2 AHOR id4 } and its corresponding picture: id id1 + 1 2 + 2 id 4 id 3 Let us consider “+ AHOR T” occurring in the first production and the positional sentence s = {id2 AVER hbar AVER id3} derivable from T. It is easy to see that (TAIL(+1), HEAD(ps )) = (+1, id2) does not belong to AHOR in +1

id 2 id 3 - 10 -

The same situation holds considering “E AHOR +” and t = {id1 AHOR +1 AHOR { id2 AVER hbar AVER id3}} derivable from E. Replacing the positional relations AHOR and AVER, it is possible to make the grammar of Example 4. linear. The price to pay is the alteration of the pictorial language generated. Let us consider the following two positional relations: HOR = {(p1, p2) : p1 and p2 are pictures and location(p1) = (x, y) and location′(p2) = (x′, y′) and x′ > x} VER = {(p1, p2) : p1 and p2 are pictures and location(p1) = (x, y) and location′(p2) = (x′, y′) and y′ < y and x′ ≤ x} The new grammar on arithmetical expressions is now the following: N = {E, T, F} S =E T = {+, hbar, (, ), id} POS = {HOR, VER} P ={ E → E HOR + HOR T | T T → T VER hbar VER F | F F → ( HOR E HOR ) | id } Here, the definitions of HOR and VER are completed by setting location(p) and location′(p) to the positions of TAIL(p) and HEAD(p), respectively, for any sub-picture p derivable from a symbol of the grammar. It is easy to see that this makes the grammar linear. A positional sentence is: {{id HOR + HOR {id VER hbar VER id}} HOR + HOR id} Due to the linearity of the grammar, this is equivalent to {id HOR + HOR id VER hbar VER id HOR + HOR id} According to the definitions of HOR, VER and PE, there are many possible pictures corresponding to the evaluation of this positional sentence, but all of them can be mapped into the following one: id id + + id id that is still a picture of the language.

- 11 -

Note that this grammar generates the two-dimensional arithmetical expressions generated by the grammar in Example 4. and, for each of them, many other positionally imperfect versions.

4. Positional LR Parsers Positional LR parsers (pLR parsers) are nothing else but a generalization of the LR parsers, [1]. The model of a pLR parser is given by: 1) Input 2) Positional operators 3) pLR Parsing Table 4) pLR Parsing Program 5) Stack 6) Output as shown in Figure 3.

Input

$ a0 a1

...

an-1 an Positional operators

sm Xm

HOR, AVER, ...

sm-1 Xm-1 ...

pLR Parsing Program

s0 Stack

Output action

goto

position

Parsing table

Figure 3. The model of a pLR Parser The input

- 12 -

The input to a pLR parser is a spatial arrangement of tokens, or, in other words, a symbolic picture where each symbol is a token. Such an input is represented by an array w of tokens, a list Q of pairs (pos, i) where pos is the position in the picture of the token w[i], and a starting index that points to the first token to parse. As the association between a position and a token allows us to reach a token in w each time its position in the picture has been given and vice versa, the order of the tokens in the array input is meaningless. The input array is, then, no longer required to be accessed sequentially but rather, according to the positional requirements built in the parser. Note that if the array w is the original matrix representing the symbolic picture input, the list Q will contain the positions of each matrix element containing a token. In this case, i is equal to pos. In this context, the definition of the sequential end-of-string marker must be extended. In fact, the end-of-string marker hides an operational aspect: when parsed, it signals that no symbols to parse are left. While in a sequential scanning nothing must be done other than recognizing the ‘$’ character, in a non-sequential scanning such operational aspect must be made explicit. Before returning an end-of-input symbol, the scanner has to check whether all the symbols have been parsed. In a pLR parser, the end-of-input marking is implemented by storing the symbol ‘$’ in location 0 of the input array, and defining the end-of-input operator ANY. Definition 9. (ANY Operator) The end-of-input operator ANY is a function whose return value is 0 if all the symbols in the input array have been parsed and ‘error’ otherwise. ¡ The operator ANY is easily implemented by letting the parser mark all the visited symbols. When invoked, ANY will only need to check if there exist unmarked symbols in the input. The positional operators For each positional relation we define a positional operator with the same name. Such an operator is a function that takes as input the index in the array input of the last token parsed, retrieves in Q the associated position, calculates from this a new position, and then returns the associated index. This is the index in the array input of the next token to parse. In the rest of the paper the positional operators will be represented by using the same name as the corresponding relations but in Arial characters. Definition 10. (Positional Operator) Given a linear positional grammar LPG = (N, T, S, P, POS, PE) and a relation REL ∈ POS, then for all x, y ∈ N ∪ T such that “x REL y” occurs on the right hand-side of a production rule in LPG, the corresponding positional operator REL is defined as follows: REL(i) = j if w[i] ∈ LAST(x) and w[j] ∈ FIRST(y)

- 13 -

where LAST(x) is the set of terminals that may end the positional sentences derived by x, FIRST(y) is the set of terminals that may begin the positional sentences derived by y, and w is the input array. ¡ Since the linearity of the LPGs allows to consider the generated two-dimensional patterns as strings, the main purpose of the spatial operators is to make the parser able to scan the tokens in the patterns in the same order as they were generated. The construction of the spatial operators is a critical point: given a linear positional grammar it may not be possible to construct the positional operators corresponding to the positional relations. Definition 11. (pLR Parsable LPG) A linear positional grammar (N, T, S, P, POS, PE) is defined to be pLR parsable iff it is possible to construct the corresponding positional operator for each relation in POS. ¡ Example 8. The grammars of Examples 1. and 2. are pLR parsable; it is in fact possible to construct their positional operators as follows: UHOR(i) = AHOR(i) = j iff location(w[i]) = (x, y) and location(w[j]) = (x+δ, y). AVER(i) = j iff location(w[i]) = (x, y) and location(w[j]) = (x, y-δ). where δ is the distance between each pair of dots. Example 9. For the arithmetical expression grammar given in Example 7., possible definitions for the operators HOR and VER are as follows: HOR(i) = j iff location(w[j]) is the highest spatial position in the first non-empty column on the right of location(w[i]). VER(i) = j iff location(w[j]) is the spatial position on the left of location(w[i]) such that it is the leftmost position in the first non-empty row below location(w[i]). It is not difficult to see that with these definitions, it is not always possible to parse arithmetic expressions in the same order as they are generated by our grammar. In fact, if we consider the pattern id id + (id + id ) with w = [ id1, +2, id3, hbar4, (5, id6, +7, id8, )9 ] and the fragment “+ HOR T” from the production E → E HOR + HOR T, we note that HOR(2) = 4 but w[4]=hbar can never begin any positional sentence derivable from T. To avoid this kind of patterns the grammar has to be changed again, and in particular the definition of the positional evaluator PE has to be changed when applied to the spatial relations HOR and VER (see Figure 4.). It is assumed that all the positions on the plane are marked as feasible before the application of PE.

- 14 -

{p1 HOR p2}: The evaluation of HOR sets location(p1) = (x, y) and location′(p2) = (x′, y′) such that (p1, p2) ∈ HOR and the position (x′, y′) is feasible. Moreover it marks as unfeasible each position belonging to any of the following sets: {(x, y1) : y ≤ y1 ≤ m} {( x1, y2) : x < x1 < x′ and 0 ≤ y2 ≤ m} {(x′, y3) : y′ ≤ y3 ≤ m} where m ≥1 is an upper bound on the y-coordinate in the two-dimensional space. {p1 VER p2}: The evaluation of VER sets location(p1) = (x, y) and location′(p2) = (x′, y′) such that (p1, p2) ∈ VER and the position (x′, y′) is feasible. Moreover it marks as unfeasible each position belonging to any of the following sets: {( x1, y) : 0 ≤ x1 ≤ x} {(x2, y1) : 0 ≤ x2 ≤ x and y′ < y1 < y} {(x3, y′) : 0 ≤ x3 ≤ x′}

unfeasible positions

y y y'

y' x

x'

x

x'

Figure 4. The executions of {p1 HOR p2} and {p1 VER p2} id cannot be (id + id ) id generated, while, instead, it is possible to generate the following pattern id + . (id + id ) It can be verified that, under such a definition of PE, the pattern id +

- 15 -

The pLR Parsing Table Besides the “action” and “goto” sections of an LR parsing table, the pLR parsing table contains an additional column called “position” whose entries contain the positional operator names corresponding to the grammar relations, and two special operator names SP and ANY. The operator SP returns the starting index given in input with a picture and ANY is the operator defined in Definition 9. All the names in the column “position” can be considered as pointers to the code implementing the operators. Since the construction of the column “position” does not affect the other entries of the original LR parsing table, we can use the traditional techniques (with some variations) for having pLR(0), Simple pLR(1) (SpLR(1) for short), canonical pLR(1) and LookAhead pLR(1) parsers, [1]. In Section 5 we will show how to construct SpLR(1), canonical pLR(1), and pLALR(1) parsing tables. The pLR Parsing Algorithm The following algorithm is a simple extension of Algorithm 4.7 in [1]; the main difference is in the setting of the pointer to the next symbol. Algorithm 1. (pLR Parsing algorithm) Input: A picture p (represented by an array of tokens w, a starting index in w, and a list Q of pairs (pos, i)), a specification of a set of positional operators and a pLR parsing table for a pLR parsable linear positional grammar LPG, as specified above. Output: if p is in L(LPG), a bottom-up parse for p; otherwise an error indication. Method: Initially, the parser has s0 on its stack, where s0 is the initial state, and w is in the input buffer. The parser executes the following program, until an “accept” or “error” action is encountered. begin set la be the starting index returned by the operator SP(); repeat forever begin let s be the state on top of the stack and a the symbol indexed by la; if action[s, a] = shift s′ then begin push a then s′ on top of the stack; set ip = la and mark w[ip]; // commit move let OP be the operator in position[s′]; set la = OP(ip) such that w[la] is not marked; end else if action[s, a] = reduce A → x then begin pop 2*|x| symbols off the stack;

- 16 -

let s′ be the state now on top of the stack; push A then goto[s′, A] on top of the stack; output the production A → x; end else if action[s, a] = accept then return else error() end; end. Note that the marking of the symbols does not allow the same symbol to be considered more than once. ¡ Example 10. Figure 5. shows a canonical pLR(1) parsing table for the following linear positional grammar for the vertical concatenation of two text lines. (1) TEXT → LINE VER LINE (2) LINE → char AHOR LINE (3) LINE → eol where the functions location() and location′() in the definition of VER are defined as in Example 7. Using the parsing table in Figure 5. and applying Algorithm 1., it can be verified that the following picture char eol char char eol is in the described language. For this language, the starting position in each sentence is naturally set to the upper-left element.

state 0 1 2 3 4 5 6 7 8 9

char s3

action eol s4

goto $

TEXT 1

position LINE 2

acc s6 s3 r3

s7 s4 r3

s6

s7

5 8 r1 9 r3

r2

r2 r2

Figure 5. A canonical pLR(1) parsing table

- 17 -

SP ANY VER AHOR VER ANY AHOR ANY VER ANY

Initially, the parser is in state 0 and the look-ahead la is set to point to the return value of SP(), i.e., the only char in the first line of the sentence. Since action[0, char] = “shift 3”, the new stack configuration becomes , with state 3 being the top, and la is set to a new input token whose position is determined by the execution of the operator stored in position[3]. In this case, the new look-ahead is the token eol which is in relation AHOR with the just analyzed char. By executing action[3, eol] = “shift 4” the stack content changes to with la pointing to the first char in the second line. The new token position has been calculated by applying the operator position[4] = VER to the just analyzed token eol. Since action[4, char] = “r3”, i.e., “reduce with production (3) LINE → eol” and goto[3, LINE] = 8, the stack becomes , with la still pointing to the first char in the second line. Continuing with the execution of the algorithm the stack reaches the configuration with la pointing to the “end-of-input” marker $ as the return value of the application of the operator ANY(). The parser then accepts the input. Example 11. Figure 6. shows the Simple pLR(1) parsing table for the following linear positional grammar generating an horizontal concatenation of a block of squares, an arrow and another block of squares

(1) S → B1 HOR ⇒ HOR B2 (2) B1 → C HOR C (3) C → VER (4) B2 → R VER R (5) R →

HOR

where the definitions of location(), location′() and PE are as in Examples 7. and 9., respectively. Using the parsing table in Figure 6. and applying Algorithm 1., it can be verified that the picture in Figure 7. is in the described language. In particular, note that the parser drives the scanning of the input such that the first block is visited by columns, and the second block by rows, according to the productions of the grammar. The remaining ways of scanning this input are not taken into consideration.

state 0 1 2

action ⇒

$

s4

S 1

B1 2

acc s5

- 18 -

goto B2

position C 3

R SP ANY HOR

3 4 5 6 7 8 9 10 11

s4 s7 s10 r3

6 8

9

r2 r3 r1

s10 s11 r5

12

12 r5 r4

HOR VER HOR HOR HOR ANY VER HOR VER ANY ANY

Figure 6. A Simple pLR(1) parsing table

Figure 7. A two-dimensional sentence

5. pLR Parsing Tables A pLR(0) item of a linear positional grammar LPG is a production of LPG with a dot at some position on the right hand-side. A dot, however, can never be between a positional relation identifier and either a terminal or a non terminal, in this order. Thus, a production A → X REL1 Y REL2 Z yields the four items: (1) [A → • X REL1 Y REL2 Z] (2) [A → X • REL1 Y REL2 Z] (3) [A → X REL1 Y • REL2 Z] (4) [A → X REL1 Y REL2 Z • ] The relation identifier immediately following a dot in an item is named driver. Initial items such as (1) have no drivers; a complete item such as (4) does not have an explicit driver, and this must be calculated depending on the type of the pLR parser to build. Intuitively, an item indicates how much of a production we have seen at a given point of the parsing process. For example, item (2) indicates that we have seen from the input a pattern derivable from X, and that we hope next to see a pattern derivable from Y and Z starting from the position specified by the operator associated to the driver REL1.

- 19 -

If LPG is a grammar with starting symbol S, then LPG′, the augmented linear positional grammar for LPG, is LPG with a new starting symbol S′, the additional relation SP, and an additional production S′ → SP S. Analogously to the definition of pLR(0) items, the definitions for the Closure and Goto operations and the construction of the sets-of-items can be easily extended from LR to pLR parsing, [1]. In this case, the Closure and Goto functions use the dot ignoring the presence of the relations. This information will be used to define the values of a new function named Position(). For each set-of-items Ii, the value of Position(Ii) is given by the set of the operators corresponding to the drivers of the items contained in Ii. Example 12. The canonical collection of sets of pLR(0) items for the grammar of Example 11. follows next, along with the values of Position(). The Goto() function for this set of items is shown in Figure 8. through the transition diagram of a deterministic finite automaton whose states are labeled by the values of the Position() function. As usual, this automaton recognizes all the viable prefixes for this grammar [1]. I0 :

S′ → • SP S S → • B1 HOR ⇒ HOR B2 B1 → • C HOR C C→• VER

I1 :

S′ → SP S •

I2 :

S → B1 • HOR ⇒ HOR B2

Position(I2) = {HOR}

I3 :

B1 → C • HOR C VER C→•


I4 :

C→

Position(I4) = {VER}

I5 :

S → B1 HOR ⇒ • HOR B2 B2 → • R VER R R→ • HOR

I6 :

B1 → C HOR C •

I7 :

C→

I8 :

S → B1 HOR ⇒ HOR B2 •

I9 :

B2 → R • VER R R→ • HOR


I10 :

R→


Position(I0) = {SP}

• VER

VER


•

• HOR

- 20 -

I11 :

R→

I12 :

B2 → R VER R •

HOR

•

Note that in the whole process of constructing the pLR(0) sets-of-items the only new rule with respect to the LR(0) sets-of-items construction algorithm is : “never place the dot between a relation identifier and a symbol”.

Figure 8. The Goto() transition diagram annotated with the values of Position() In order to construct a pLR parser we need to define four functions on a positional grammar: FIRST, FOLLOW, POSFOLLOW and PTFOLLOW. 1) FIRST is defined as in [1] in the underlying context-free grammar obtained by ignoring the positional relations: If Π is any positional sentential form, FIRST(Π) is the set of terminals that may begin a positional sentence derived from Π. 2) FOLLOW is defined as in [1] in the underlying context-free grammar ignoring the positional relations: 1. Place $ in FOLLOW(S), where S is the starting symbol and $ is the end-of-input marker; 2. If there is a production “A → α B REL β”, then everything in FIRST(β) is placed in FOLLOW(B); 3. If there is a production “A → α B”, then everything in FOLLOW(A) is in FOLLOW(B). Note that we are not considering ε productions. 3) POSFOLLOW is defined similarly to FOLLOW, but it considers only the positional relations: 1. Place ANY in POSFOLLOW(S), where S is the starting symbol;

- 21 -

1. If there is a production “A → α B REL β” then REL is placed in POSFOLLOW(B); 2. If there is a production “A → α B”, then everything in POSFOLLOW(A) is also in POSFOLLOW(B). 4) PTFOLLOW is similar to FOLLOW, but it considers both positional relations and terminals: 2. Place (ANY, $) in PTFOLLOW(S), where S is the starting symbol; 3. If there is a production “A → α B REL β” then (REL, FIRST(β)) is placed in PTFOLLOW(B); 4. If there is a production “A → α B”, then everything in PTFOLLOW(A) is also in PTFOLLOW(B). Note that both FOLLOW and POSFOLLOW can be derived from PTFOLLOW. The POSFOLLOW and PTFOLLOW functions for the grammar in Example 7. are given by: POSFOLLOW(E) POSFOLLOW(T) PTFOLLOW(E) PTFOLLOW(T)

= {HOR, ANY} = POSFOLLOW(F) = {HOR, ANY, VER} = {(ANY, $), (HOR, ‘+’), (HOR, ‘)’)} = PTFOLLOW(F) = {(ANY, $), (HOR, ‘+’), (HOR, ‘)’), (VER, ‘hbar’)}

Constructing an SpLR(1) parsing table The following is an extension of Algorithm 4.8 in [1]; the only differences regards parts 2.c and 2.d, and the the addition of part 2.a. Algorithm 2. (The SpLR(1) parsing table construction algorithm) Input: An augmented pLR parsable linear positional grammar LPG′; Output: The SpLR(1) parsing table in its action, goto and position sections for LPG′. Method: 1. Construct C = {I0, I1, ..., In}, the collection of sets of pLR(0) items for LPG′. 2. State i is constructed from Ii. The parsing actions and positions for state i are determined as follows: a) Each time A→ [α • REL β] is in Ii, add REL to position[Ii]. b) If [A → α • REL a β] is in Ii and Goto(Ii, a) = Ij, then set action[i, a] to “shift j” (“sj”). Here ‘a’ must be a terminal. Note that REL is ignored. c) If [A → α • ] is in Ii, then set action[i, a] to “reduce A → α” for all ‘a’ in FOLLOW(A) and add POSFOLLOW(A) to position[Ii]. Here A may not be S′. d) If [S′ → SP S • ] is in Ii, then set action[i, $] to “accept” and add ANY to position[i].

- 22 -

If any conflicting actions or positions 1 are generated by the above rules, we say the grammar is not SpLR(1). The algorithm fails to produce a parser in this case. 3. The goto transitions for state i are constructed for all non-terminals A using the rule: If Goto(Ii, A) = Ij, then goto[i, A] = j. 4. All entries not defined by rules (2) and (3) are made “error”. 5. The initial state of the parser is the one constructed from the set of items containing [S′ → • SP S]. A pLR parsable linear positional grammar having an SpLR(1) parsing table is said to be SpLR(1). A pictorial language generated by an SpLR(1) grammar is said to be SpLR(1). ¡ Example 13. It is easy to verify that Algorithm 2. produces the table of Figure 6. if we consider the grammar in Example 11., its items in Example 12., the Goto() function in Figure 8. and the following definition of POSFOLLOW for it: POSFOLLOW(S) = POSFOLLOW(B2) = {ANY} POSFOLLOW(B1) = POSFOLLOW(C) = {HOR} POSFOLLOW(R) = {VER, ANY} Constructing pLR(1) and pLALR(1) parsing tables In the same way as the algorithm for the SpLR(1) parsing table has been derived from the algorithm for the SLR(1) parsing table in [1], it is possible to derive the algorithm for the canonical pLR(1) parsing table from the algorithm for the canonical LR parsing table in [1]. A pLR(1) item is a pair given by a pLR(0) item and a set of lookaheads of type (REL, a). Starting from the initial pLR(1) item [S′ → • SP S, (ANY, $)], the functions Goto() and Closure(), and the set-of-items construction algorithm calculate the lookaheads of the remaining items in an LR(1) fashion by making use of the function PTFOLLOW() instead of FOLLOW(). In this case, a complete item has a set of drivers given by the union of the relation parts of its lookaheads. The definition of the function Position() is unaltered. The pLR(1) parsing table construction algorithm can be easily derived following the techniques from the LR(1) and pLR(0) parsing table construction algorithms. In the following we show the pLR(1) sets-of-items for the grammar in Example 10: I0 :

S′ → • SP TEXT TEXT → • LINE VER LINE LINE → • char AHOR LINE LINE → • eol

(ANY $) (ANY $) (VER char) (VER eol) (VER char) (VER eol)

Position(I0) = {SP}

I1 :

S′ → SP TEXT •

(ANY $)

Position(I1) = {ANY}

1

The occurrence in the same entry of the operator ANY with another operator is considered a false conflict. It is solved by executing the operator ANY only if the other operator fails to return a new token.

- 23 -

I2 :

TEXT → LINE • VER LINE LINE → • char AHOR LINE LINE → • eol

(ANY $) (ANY $) (ANY $)


I3 :

LINE → char • AHOR LINE LINE → • char AHOR LINE LINE → • eol

(VER char) (VER eol) (VER char) (VER eol) (VER char) (VER eol)

Position(I3) = {AHOR}

I4 :

LINE → eol •

(VER char) (VER eol)


I5 :

TEXT → LINE VER LINE •

(ANY $)


I6 :

LINE → char • AHOR LINE LINE → • char AHOR LINE LINE → • eol

(ANY $) (ANY $) (ANY $)

Position(I6) = {AHOR}

I7 :

LINE → eol •

(ANY $)


I8 :

LINE → char AHOR LINE •

(VER char) (VER eol)


I9 :

LINE → char AHOR LINE •

(ANY $)


The construction of a pLALR(1) parsing table can be easily derived either merging pLR(1) sets of items with the same core or extending the efficient LALR(1) parsing table construction algorithm given in [1].

6. Non pLR(1) grammars In Section 3 we gave a two-dimensional version of the grammar given in [1] for a subset of the arithmetical expressions. We will show now that this grammar is not pLR(1). Let us consider its augmented grammar: E′ → SP E E → E HOR + HOR T | T T → T VER hbar VER F | F F → ( HOR E HOR ) | id and the two pLR(1) sets-of-items, I0 and I2: I0 : E′ → • SP E

(ANY $)

Position(I0) = {SP}

E → • E HOR + HOR T

(HOR +) (ANY $)

E→ • T

(HOR +) (ANY $)

T → • T VER hbar VER F

(HOR +) (VER hbar) (ANY $)

T→ • F


- 24 -

F → • ( HOR E HOR )


F → • id


I2 : E → T• T → T • VER hbar VER F

(HOR +) (ANY $)

Position(I2) = {HOR, VER, ANY}


Since Position(I2) contains two positional operators HOR and VER (both different from ANY) a positional conflict occurs. Although the original grammar is LR(1), this conflict leads to a shift/reduce conflict as it can be easily seen considering the following pictorial form T + id id assuming that T has already been reduced. When the parser reads T in I0, it is made to enter in state I1 by the Goto() function. At this point it has to decide whether to choose ‘hbar’ in vertical reading, or ‘+’ in horizontal reading. Both the alternatives are valid: if ‘hbar’ is chosen, then the parser has to shift, since ‘hbar’ is not in FOLLOW(E); otherwise, it has to reduce with E → T. One possible solution for avoiding this conflict is to assign a priority value to each positional operator. In this example we could assign priority values such that the vertical reading is always to be considered prior to the horizontal reading. This would respect the priority between ‘hbar’ and ‘+’ implicitly given in the grammar. But, the following pictorial form (T + id ) + id id shows that the priority solution cannot be applied for this grammar. In fact, in this case, the next reading after T should be made horizontally and not vertically. In the considered grammar this is suggested by the presence of a parenthesis before T. The problem is that with or without a parenthesis, the parser always reaches state I2 after the reduction of T and it cannot distinguish between the two different cases. Another possibility for avoiding this type of conflict is to give a “smart” representation of the two-dimensional pattern deriving it from techniques of image analysis like dominancy [4, 16]. Last but not least, we can construct an equivalent pLR(1) grammar in the same way as it is normally done for solving conflicts in LR parsing. Following this last idea, a pLR(1) grammar for the arithmetical expressions has been constructed. The pLR(1) property has been obtained by introducing the new tokens id, (, and ), to distinguish the case when an expression acts as a numerator: (0) E′ → SP E (1) E → E HOR + HOR T (2) E → T (3) T → T′ VER F (4) T → F

- 25 -

(5) F → ( HOR E HOR ) (6) F → id (7) T′ → T′ VER F′ (8) T′ → F′ (9) F′ → ( HOR E HOR ) (10) F′ → id

Since this grammar produces sentences of type : id id + id id the analysis of a standard arithmetic expression will have to undergo a simple pre-analysis phase. It can be seen that this grammar is both pLR(1) and SpLR(1) linear positional grammar. Figure 9. shows its SpLR(1) parsing table. (id + id) + id id

- 26 -

state $

id s7

+

)

0 1

acc

s10

2

r2

r2

r2

3 4

r4

r4

r4

5 6

r6

r6

r6

action ( s5

s7

s6

s6

s8

r1

r1

r1

12

r3

r3

r3

r3

r7 s16

r7

r5

r7 s10 s10 r5 r9

T 2

goto F T' 4 3

12

14

2

4

3

15

2

4

3

11

4

3

r6

r1

17

E 1

s8

7 8 9 10 11

13 14 15 16

( s8

r4 s5

r10 s7 r8 s7

)

r2 s5

s7

id s6

r10 s5 r8 s5

r10 s6 r8 s6

r10 s8 r8 s8

s17 r5

r5 r9

r9

r9

position F' 9

SP HOR ANY HOR ANY 13 VER HOR ANY 9 HOR HOR ANY VER HOR VER 9 HOR HOR ANY HOR ANY HOR HOR HOR HOR ANY VER

Figure 9. The SpLR(1) parsing table for the new version of the arithmetical expression grammar Example 14. Let us now consider the linear positional grammar which generates sequences of upper right corners with variable length sides as shown in Figure 10. N = { SeqCorn, Corner, HLine, Vline} T = {dot} POS = {HOR, AHOR, AVER} (1) SeqCorn → Corner HOR SeqCorn (2) SeqCorn → Corner (3) Corner → HLine AHOR VLine (4) HLine → HLine AHOR dot (5) Hline → dot

- 27 -

(6) VLine → VLine AVER dot (7) Vline → dot

. . . . . . . . . .

(a)

. . . .

. . . . . . . . . .

. . . . . .

(b) (c) Figure 10. Two sequences of upper right corners

This grammar is ambiguous as it can be seen by considering the picture in Figure 10.c and noticing that there are two different left-most derivations for it: the first one is obtained by applying the sequence of productions 2 3 4 4 5 6 6 7, while the second one is obtained by considering the sequence 1 3 5 7 2 3 5 6 6 7. In the first case (Figure 11.i), the picture is interpreted as only one corner, in the second one (Figure 11.ii), it is interpreted as a sequence of two corners.

. . . . . .

. . . . . .

(i) (ii) Figure 11. Two possible interpretations according to the grammar in Example 14. Being ambiguous, this grammar is also not pLR(1). This can be shown considering the following set-of-items, produced by the application of the pLR(1) set-of-items construction algorithm to the grammar above: I7 HLine → HLine AHOR dot • Vline → dot • Vline → dot • Vline → dot •

(AHOR dot) (AVER dot) (HOR dot) (ANY $)

This set-of-items contains a positional conflict since Position(I7) = {AHOR, AVER, HOR, ANY}. However, in this case, an appropriate assignment of the priorities to the positional

- 28 -

operators makes the grammar pLR(1), by solving simultaneously the conflicts due to the grammar ambiguity and to the additional positional conflicts. To set the priorities, let us analyze the set-of-items I7. The first item indicates that the parser has seen part of the horizontal side of a corner; the second item that it has seen a dot in a vertical side; the third item indicates that it has seen the last dot of a vertical side; the fourth item that it has seen the last dot of the vertical side of the last corner or, in other words, the last dot of an input picture. In each item the look-ahead indicates to the parser the appropriate direction to follow and the action to take. In particular, in order to respect the semantics of the language, each time the parser has seen a dot and has reached state I7 it should perform the following algorithm: 1. decide whether the just considered dot is the start of a vertical side; this means to follow the 2nd item in I7, i.e., to look for the next symbol in vertical concatenation (AVER), and, if it is a dot, then reduce with production (4), otherwise 2. decide whether the dot is part of the horizontal side of a corner; this means to follow the 1st item in I7, i.e., to look for the next symbol in adjacent horizontal concatenation (AHOR), and, if it is a dot, then reduce with production (7), otherwise 3. decide whether the dot is the last dot of the vertical side of a corner ; this means to follow the 3rd item in I7, i.e., to look for the next symbol in horizontal concatenation (HOR), and, if it is a dot, then reduce with production (7), otherwise 4. decide whether it is the last dot of the vertical side of the last corner in the sentence ; this means to follow the 4th item in I7, i.e., to check whether there is any unmarked symbol left (ANY) and, in the negative case, reduce with production (7) or, otherwise, signal a syntactic error. From above, we decide to assign the priorities in the following order : AVER AHOR HOR ANY Note that the choice of applying AHOR before than HOR makes the parser interpret the picture in Figure 11. as only one corner (case (i)). Figure 12. shows the corresponding pLR(1) parsing table where the position column entries have all been sorted according to the order defined above. It can be noted that, apart from the goto part, the parsing table rows containing multiple entries have been partitioned in numbered “sub-states”, and that a new action gk has been introduced. By reading gk, the parser will go to sub-state k without modifying the stack and the input pointer.

- 29 -

state 0 1 2 3 4 5 6

dot s4 1 2

1

1 2 3 4

8

accept g2 r2

goto

position

SeqCorn 1

Corner 2

HLine 3

5

2

3

Vline

s7 r5

2 3

7

s4

action $

1 2 3

s8 r3 r7 r4 r7 r6 r6

6 r1 g2 g3 r3 g2 g3 g4 r7 g2 g3 r6

SP ANY HOR ANY AHOR AHOR ANY AVER HOR ANY AVER AHOR HOR ANY AVER HOR ANY

Figure 12. A pLR(1) parsing table solving positional conflicts The following is a parsing trace for a picture made of three consecutive dots. In each triple (st, REL, pic), “st” is the stack configuration with the top as the right-most element, “REL” is the operator to apply to get the next symbol, and “pic” is the remaining part of the input picture to be parsed. Each transaction is labeled with the actions to be performed: (0, SP, dot dot dot) ⇒s4 (0 dot 4, AHOR, dot dot) ⇒r5, goto3 (0 Hline 3, AHOR, dot dot) ⇒s7 (0 Hline 3 dot 71, AVER, dot) ⇒g2 (0 Hline 3 dot 72, AHOR, dot) ⇒r4, goto3 (0 Hline 3, AHOR, dot) ⇒s7 (0 Hline 3 dot 71, AVER, $) ⇒g2 (0 Hline 3 dot 72, AHOR, $) ⇒g3 (0 Hline 3 dot 73, HOR, $) ⇒g4 (0 Hline 3 dot 74, ANY, $) ⇒r7, goto6 (0 Hline 3 Vline 61, AVER, $) ⇒g2 (0 Hline 3 Vline 62, HOR, $) ⇒g3 (0 Hline 3 Vline 63, ANY, $) ⇒r3, goto2 (0 Corner 21, HOR, $) ⇒g2 (0 Corner 22, ANY, $) ⇒r2, goto1 (0 SeqCorner 1, ANY, $) ⇒ accept

- 30 -

7. The parsing methodology for SpLR(1) grammars In this section we give an algorithm for the translation of an SpLR(1) linear positional grammar into an SLR(1) grammar with actions for the automatic generation of a parser for two-dimensional languages through the tool YACC. Moreover we provide a global methodology for the implementation of SpLR(1) linear positional grammars. Algorithm 3. (Conversion from an SpLR(1) to an SLR(1) grammar with actions) Input : An SpLR(1) grammar LPG = (N, T, S, P, POS, PE) and the implementations of the operators corresponding to the relations in POS. Output: An SLR(1) grammar with actions G = (N′, T′, S′, P′), such that p ∈ L(LPG) iff (w, Q, SP) is accepted by a YACC implementation of G. Here (w, Q, SP) is the input representation of p as defined in Section 4. Method: set N′ = N, T′ = T, S′ = S, P′ = Ø for each production p in P do begin let p′ be the production obtained by removing all the spatial relations from p if p is of type A → α a REL β then replace ‘a’ with “a REL” in p′ and add REL to N′ if p is of type A → α a then begin if POSFOLLOW(A) = {REL} then replace ‘a’ with “a REL” in p′ and add REL to N′ if POSFOLLOW(A) = {REL, ANY} then replace ‘a’ with “a REL_ANY” in p′ and add REL_ANY to N′ end add p′ to P′ end for each REL (or REL_ANY, resp.) added to N′ in the previous “for” loop do add REL → {REL()} (or REL_ANY → {REL_ANY()}, resp.) to P′ end. Each function REL() is the implementation of the corresponding positional operator as defined in Section 4. The function REL_ANY() is so defined:

- 31 -

next_index = REL(); if there is no token in the position calculated by REL(), i.e. next_index = 0, then return ANY() else return next_index Since the grammar LPG is SpLR(1), the POS_FOLLOW sets calculated in the first “for” loop may only contain one relation identifier different from ANY. Since the algorithm for the construction of SpLR(1) items does not take into account the relation identifiers, the shift-reduce or reduce-reduce conflicts may only occur due to the presence of the relation identifiers in G. However it is easy to prove that if a conflict occurs for G then this also occurs for LPG. Hence G is SLR(1). The correctness of the algorithm directly derives from the linearity of LPG. Note that, since an SLR(1) grammar is also LALR(1), we can use YACC to automatically build a parser for G and then parse pictures from LPG. Example 15. Let us consider the SpLR(1) version of the arithmetical expression grammar given in the previous section. Since POSFOLLOW(F) = {HOR_ANY} and POSFOLLOW(F′) = {VER}, Algorithm 3. produces the following SLR(1) grammar with actions: (1) E → E + HOR T (2) E → T (3) T → T′ F (4) T → F (5) F → ( HOR E ) HOR_ANY (6) F → id HOR_ANY (7) T′ → T′ F′ (8) T′ → F′ (9) F′ → ( HOR E ) VER (10) F′ → id VER (11) HOR → {HOR()} (12) VER → {VER()} (13) HOR_ANY → {HOR_ANY()} ¡ We are then ready to define the general methodology for implementing positional grammars:

- 32 -

I. Define the positional relations and operators meant to relate objects in the patterns, and construct the grammar to describe the language. II. Convert the picture into the parser input as defined in Section 4. III. Construct the parser. Point I requires the construction of an SpLR(1) grammar along with the positional operators. From the arithmetical expression grammar we learnt that the construction of the positional operators may modify the original grammar. Hence, the definitions of relations and operators are strongly related to obtain a linear positional grammar. Point II requires a pre-processing phase for the conversion of the input into an array of tokens w, a starting index in w, SP, and an association list Q of positions and tokens. In particular, the list Q must be implemented such that the positional operators can be executed efficiently. For a different set of positional operators a different implementation may be required. Finally, point III requires the construction of the parser. By applying Algorithm 3., this can be done by translating the SpLR(1) grammar into an LALR(1) grammar with actions and then by using the tool YACC. The results of this research have been applied and experimented in the Pittsburgh-Salerno Iconic System (PSIS for short) [6, 10] which has been jointly developed by the visual language research groups of the Universities of Pittsburgh and Salerno. In particular, PSIS lets users design, specify, and interpret custom visual iconic languages for different applications. The system has two major subsystems: the Visual Language Compiler which allows a user to compose and compile visual sentences and the Visual Language Generator which allow a user to define an iconic language through user-supplied sample visual sentences. The first subsystem, among the other parsers, successfully adopts the pLR parser by providing a complete testing and experimentation of the ideas presented in this paper. In the following we give some examples of the application of the methodology to the case of the arithmetic expressions. Here, the coordinates (x, y) are set such that x represents the column index in left-right progression, and y the row index in top-down progression. Each example case shows the original picture to be analyzed, the input triple (w, Q, SP) together with its two-dimensional layout, the parsing trace, and the result of the compilation. Example 16. 1) The 2-D arithmetical expression: 19 +

144 12

The parser input:

- 33 -

w w[0] = “$” w[1] = “19” w[2] = “+” w[3] = “144” w[4] = “12”

Q

SP

(1, (1,2)) (2, (2,1)) (3, (3,1)) (4, (3,2))

*

+

144

19

12

The parsing trace: 1 2 3 4 0 The result: 31

2) The 2-D arithmetical expression:

(99 + 501) 10 * 6 2

The parser input: w w[0] = “$” w[1] = “(” w[2] = “99” w[3] = “+” w[4] = “6” w[5] = “501” w[6] = “)” w[7] = “*” w[8] = “10” w[9] = “2”

Q

SP

(1, (1,1)) (2, (2,1)) (3, (3,1)) (4, (3,2)) (5, (4,1)) (6, (5,1)) (7, ( 6,2)) (8, (7,1)) (9, (7,2))

*

(

99

+

501

6

)

10 *

The parsing trace: 1 2 3 5 6 4 7 8 9 0 The result: 500 10000 10 + 2000 3) The 2-D arithmetical expression: 4 The parser input: w

Q

SP 10000 10

- 34 4

+

2000

2

w[0] = “$” w[1] = “10000 ” w[2] = “10” w[3] = “4” w[4] = “+” w[5] = “2000”

(1, (1,1)) (2, (1,2)) (3, (1,3)) (4, (2,2)) (5, (3,2))

*

The parsing trace = 1 2 3 4 5 0 The result: 2250 4) The 2-D arithmetical expression:

8 −2  4    2

The parser input: w w[0] = “$” w[1] = “(“ w[2] = “8” w[3] = “4” w[4] = “2” w[5] = “)” w[6] = “-” w[7] = “2”

Q (1, (1,2)) (2, (2,1)) (3, (2,2)) (4, (2,3)) (5, (3,2)) (6, (4,2)) (7, (5,2))

SP

*

8 (

4

)

-

2

2

The parsing trace = 2 1 3 4 5 6 7 0 The result: 2

8. Conclusions In this paper we have presented a parser for a subclass of symbolic two-dimensional languages. We showed that this class contains the context-free string languages and that languages like the two-dimensional arithmetical expression language can be parsed by the proposed model. We also showed that this class can be implemented by a particular use of the well-known tool YACC. At the moment we are investigating the extension of generalized parsers like Earley's ([17]) and Tomita's ([43]) algorithms by applying the same technique used for extending the LR parser. First results have been obtained in [14] and [15], respectively.

- 35 -

In the future we intend to extend the subclass of parsable two-dimensional symbolic languages by constructing more powerful parsers and still preserving the LR-type efficiency. Initially we will consider the introduction of heuristics to solve parsing conflicts. We will study how to introduce fuzzy sets theory into the parser, so that the parser can deal with multiple next symbols. We will consider the possibility to use more powerful positional evaluators (PEs) so that unfeasible solutions can be eliminated, and possible ones considered.

References [1] A. V. Aho, R. Sethi, and J. D. Ullman, Compilers, principles, techniques, and tools, Addison Wesley, 1985. [2] H.G. Barrow and J.R. Popplestone, “Relational Descriptions in picture pro Machine Intelligence, vol. 6, pp. 377-396, 1971. [3] N.S. Chang and K.S. Fu, “Parallel Parsing of Tree Languages for Syntactic Pattern Pattern Recognition, vol. 11, no. 3, pp. 213-222, 1979. [4] S.-K. Chang, “A Method for the Structural Analysis of Two Dimensional Mathematical Expressions”, Information Sciences, vol. 2, pp. 253-272, 1970. [5] S.-K. Chang, “Picture Processing Grammar and its Applications”, Information Sciences, vol. 3, pp. 121-148, 1971. [6] S.-K. Chang, G. Costagliola, G. Pacini, G.Tortora, M. Tucci, B. Yu and J.S. Yu, "A Visual-Language System for User Interfaces", IEEE Software, vol. 12, no. 2, pp.33-44, March 1995. [7] S.-K. Chang, E. Jungert, and Y. Li, “The Design of Pictorial Databases based upon the Theory of Symbolic Projections” Proceedings of 1989 Conference on Very Large Spatial Database, Springer-Verlag, 1989. [8] S.-K. Chang, E. Jungert, and Y. Li, “Representation and Retrieval of Symbolic Pictures using Generalized 2D Strings”, SPIE Proceedings of Visual Communications and Image Processing, Philadelphia, PA, November 5-10, 1989. [9] S.-K. Chang, Q.Y. Shi, and C.W. Yan, “Iconic Indexing by 2-D strings”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-6, no. 4, pp. 475-484, July 1984. [10] S.-K. Chang, M.J. Tauber, B. Yu, and J.S. Yu, “A Visual Language Compiler”, IEEE Transactions on Software Engineering, vol. 15, no. 5, pp. 506-525, 1989. [11] S.S. Chok and K. Marriot, "Parsing Visual Languages", Proceedings 18-th Australian Computer Science Conference, Australian Computer Science Comm., vol. 17, no. 1, pp. 90-98, February 1995. [12] M.B. Clowes, “Pictorial Relationships - A Syntactic Approach”, Machine Intelligence, vol. 4, Amer. Elsevier, New York, 1969. [13] G. Costagliola, A. De Lucia, S. Orefice, and G. Tortora, “A Parsing Methodology for the Implementation of Visual Systems”, IEEE Transaction on Software Engineering, vol. 23, no 12, December 1997.

- 36 -

[14] G. Costagliola, M. Tomita and S.K. Chang, "A Generalized Parser for 2-D Languages" in Proc. of 1991 IEEE Workshop on Visual Languages, International Convention Center Kobe, Japan, October 9-11, pp. 98-104. [15] G. Costagliola, S.K. Chang, M. Tomita, "Parsing 2D Languages by a Pictorial Generalized LR Parser", in Advanced Visual Interfaces, T. Catarci, M.F. Costabile, S. Levialdi (Editors), World Scientific Series in Computer Science - Vol. 36, 1992, pp. 319-333. [16] C. Crimi, A. Guercio, G. Pacini, G. Tortora, and M. Tucci, “Automating Visual Language IEEE Transactions on Software Engineering, vol. 16 , no. 10, pp. 11221135, October 1990. [17] J. Earley, “An Efficient Context-Free Parsing Algorithm”, Communications of the ACM, vol. 13, pp. 94-102, 1970. [18] J. Feder, “Plex Languages”, Information Sciences, vol. 3, pp. 225-241, 1971. [19] F. Ferrucci, G. Pacini, G. Satta, M. Sessa, G. Tortora, M. Tucci, G. Vitiello, “SymbolRelation Grammars: A Formalism for Graphical Languages”, Information and Computation, vol. 131, pp.1-46, Nov. 1996. [20] M. Flasinski, “Characteristics of edNLC-Graph Grammar for Syntactic Pattern Recognition”, Computer Vision Graphics and Image Processing, vol. 47, pp. 1-21, 1989. [21] K.S. Fu, Syntactic Methods in Pattern Recognition, Academic Press, New York and London, 1974. [22] K.S. Fu, Syntactic Pattern Recognition and Applications, Prentice Hall, Inc. Englewood Cliffs, N.J., 1982. [23] K.S. Fu and B.K. Bhargava, “Tree Systems for Syntactic Pattern Recognition”, IEEE Transasctions on Computing, vol. C-22 (12), pp. 1089-1099, 1973. [24] E.J. Golin, "Parsing Visual Languages with Picture Layout Grammars", Journal of Visual Languages and Computing , Academic Press, London, vol. 2, pp. 1-23, 1991. [25] S.C. Johnson, “YACC: Yet Another Compiler-Compiler”, tech. rep., Bell Laboratories, 1974. [26] C.Y. Li, T. Kawashima, T. Yamamoto, and Y. Aoki, “Attribute Expansive Graph Grammar for Pattern Description and its Problem-reduction Based Processing”, Trans. IEICE, vol. E-71 (4), pp. 431-440, Japan, 1988. [27] S.Y. Lu and K.S. Fu, “Error-correcting Tree Automata for Syntactic Pattern Recognition”, IEEE Transasctions on Computing, vol. C-27, pp. 1040-1053, 1978. [28] K. Marriott and B. Meyer (Editors), The Theory of Visual Languages, Springer Verlag, to be published in 1998. [29] D.L. Milgram and A. Rosenfeld, “Array Automata and Array Grammars”, Information Processing , vol. 71, pp. 69-74, North-Holland Publ., Amsterdam, 1972. [30] R. Narasimhan, “Syntax-directed Interpretation of Classes of Pictures”, Communications of ACM, vol. 9, pp. 166-173, 1966.

- 37 -

[31] K. J. Peng, T. Yamamoto, and Y. Aoki, “A New Parsing Scheme for Plex Grammars”, Pattern Recognition, vol. 23 , no. 3/4, pp. 393-402, 1990. [32] J. L. Pfaltz, “Web Grammars and Picture Description”, Computer Graphics and Image Processing, vol. 1, pp. 193-220, 1972. [33] J. L. Pfaltz and A. Rosenfeld, “Web Grammars”, Proceedings of theFirst International Joint Conference on Artificial Intelligence, pp. 609-619, Washington, DC, May 1969. [34] J. Rekers, and A. Schurr. “A Graph Based Framework for the Implementation of Visual Proceedings of the 1996 IEEE Symposium on Visual Languages, Boulder, Colorado, pp. 148-157, Oct. 1996. [35] A. Rosenfeld, Picture Languages: Formal Models for Picture Recognition, Academic Press, New York, San Francisco and London, 1979. [36] A. Rosenfeld and D. L. Milgram, “Web Automata and Web Grammars”, Machine Intelligence, vol. 7, pp. 307-324, 1972. [37] W.C. Rounds, “Context Free Grammars on Trees”, Proceedings of 10th Symposium on Switching and Automata Theory, p. 143, 1969. [38] A.C. Shaw, “A Formal Picture Description Scheme as a Basic for Picture Processing Information and Control, vol. 14, pp. 9-52, 1969. [39] Q.Y. Shi and K.S. Fu, “Efficient and Error-correcting Parsing of (attributed and stochastic) Tree Grammars”, Information Sciences, vol. 26, pp. 159-188, 1982. [40] Q. Y. Shi and K.S. Fu, “Parsing and Translation of Attributed Expansive Graph Languages for Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-5, pp. 472-485, 1983. [41] G. Siromoney, R. Siromoney, and K. Krithivasan, “Abstract Families of Matrices and Computer Graphics and Image Processing, vol. 1, pp. 284-307, 1972. [42] G. Siromoney, R. Siromoney, and K. Krithivasan, “Array Grammars and Kolam”, Computer Graphics and Image Processing, vol. 3, pp. 63-82, 1974. [43] M. Tomita, Efficient Parsing for Natural Languages, Kluwer Academic Publishers, Boston, MA, 1985. [44] M. Tomita, “Parsing 2-Dimensional Languages”, Proceedings of the International Workshop on Parsing Technologies, pp. 414-424, Pittsburgh, PA. Carnegie Mellon, 2831 August 1989. [45] M. Tucci, G. Vitiello and G. Costagliola, "Parsing Non-linear Languages", IEEE Transactions on Software Engineering, vol. 20, no. 9, September 1994. [46] P.S.P. Wang, “Recognition of Two-Dimensional Patterns”, Proc. Assoc. Comput. Mach. Nat. Conf., pp. 484-489, 1977. [47] K. Wittenburg, "Earley-style parsing for Relational Grammars", Proceedings of the 1992 IEEE Workshop on Visual Languages, IEEE Comp. Soc. Press, pp. 192-199, 1992.

- 38 -

- 39 -

. . . . . . . .

Figure 1. An upper-right corner

Figure 2. Four examples of lines.

Input

$ a0 a1

...

an-1 an Positional operators

sm Xm

HOR, AVER, ...

sm-1 Xm-1 ...

pLR Parsing Program

s0 Stack

Output action

goto

position

Parsing table

Figure 3. The model of a pLR Parser

- 40 -

unfeasible positions

y y y'

y' x

x'

x

x'

Figure 4. The executions of {p1 HOR p2} and {p1 VER p2}

state 0 1 2 3 4 5 6 7 8 9

char s3

action eol s4

goto $

TEXT 1

position LINE 2

acc s6 s3 r3

s7 s4 r3

s6

s7

5 8 r1 9 r3

r2

r2 r2

Figure 5. A canonical pLR(1) parsing table

- 41 -

SP ANY VER AHOR VER ANY AHOR ANY VER ANY

state 0 1 2 3 4 5 6 7 8 9 10 11 12

action ⇒

$

s4

S 1

B1 2

goto B2

position C 3

R

acc s5 s4 s7 s10 r3

6 8

9

r2 r3 r1

s10 s11 r5

12 r5 r4

Figure 6. A Simple pLR(1) parsing table

Figure 7. A two-dimensional sentence

- 42 -

SP ANY HOR HOR VER HOR HOR HOR ANY VER HOR VER ANY ANY

Figure 8. The Goto() transition diagram annotated with the values of Position()

state $

id s7

+

)

0 1

acc

s10

2

r2

r2

r2

3 4

r4

r4

r4

5 6

r6

r6

r6

action ( s5

s7

s6

s6

r10 s5 r8 s5

s8

r1

r1

r1

12

r3

r3

r3

r3

r7 s16

r7

r5

r7 s10 s10 r5 r9

T 2

goto F T' 4 3

12

14

2

4

3

15

2

4

3

11

4

3

r6

r1

17

E 1

s8

7 8 9 10 11

13 14 15 16

( s8

r4 s5

r10 s7 r8 s7

)

r2 s5

s7

id s6

r10 s6 r8 s6

r10 s8 r8 s8

s17 r5

r5 r9

r9

r9

position F' 9

SP HOR ANY HOR ANY 13 VER HOR ANY 9 HOR HOR ANY VER HOR VER 9 HOR HOR ANY HOR ANY HOR HOR HOR HOR ANY VER

Figure 9. The SpLR(1) parsing table for the new version of the arithmetic expression grammar

. . . . . . . . . .

(a)

. . . .

. . . . . . . . . .

(b)

- 43 -

. . . . . .

(c)

Figure 10. Two sequences of upper right corners

. . . . . .

. . . . . .

(i)

(ii)

Figure 11. Two possible interpretations according to the grammar in Example 14.

state 0 1 2 3 4 5 6

dot s4 1 2

s4

action $ accept g2 r2

goto Corner 2

HLine 3

5

2

3

s7 r5 1 2 3

s8 r3

position

SeqCorn 1

Vline

6 r1 g2 g3 r3

- 44 -

SP ANY HOR ANY AHOR AHOR ANY AVER HOR ANY

7

1 2 3 4

8

1 2 3

r7 r4 r7 r6 r6

AVER AHOR HOR ANY AVER HOR ANY

g2 g3 g4 r7 g2 g3 r6

Figure 12. A pLR(1) parsing table solving positional conflicts

- 45 -

Using Linear Positional Grammars for the LR Parsing of 2-D Symbolic ...

Using Linear Positional Grammars for the LR Parsing of 2-D Symbolic ...

Suggest Documents

Pregroup grammars with linear parsing of the French verb ... - CiteSeerX

Scannerless Generalized-LR Parsing

Scannerless Generalized-LR Parsing

Parsing unary Boolean grammars using online

Increasing the Applicability of LR Parsing - CiteSeerX

A Robust Parsing Algorithm For Link Grammars

Parsing with Compositional Vector Grammars

A Robust Parsing Algorithm For Link Grammars

Building LR (1) grammars and parsers using VAST

Grammars for Funk Drumming: Symbolic and ... - MindModeling.org

Mathematical Formulae Recognition using 2D Grammars - Center for ...

Guaranteeing Parsing Termination of Unification Grammars

Spoken Language Parsing Using Phrase-Level Grammars and ...

A New Probabilistic LR Parsing - Semantic Scholar

Generalized LR Parsing in Haskell - CiteSeerX

The hierarchy of LR-attributed grammars - Semantic Scholar

Applications to Parsing Lexicalized Tree Adjoining Grammars

On Parsing Two-Level Grammars - CiteSeerX

Parsing Expression Grammars: A Recognition-Based Syntactic ...

Parsing with Polymorphic Categorial Grammars - CICLing

Head-driven Parsing for Lexicalist Grammars - Association for ...

A tool for teaching LL and LR parsing algorithms - CiteSeerX

Symbolic equation for linear analog electrical circuits using ... - wseas

GLR Parsing with Multiple Grammars for Natural Language Queries