REGULATED ARRAY GRAMMARS OF FINITE ... - Semantic Scholar

1 downloads 0 Views 351KB Size Report
these families de ned by regulated context-free array grammars working with a nite index restriction via special k-head nite array automata, which can be used inĀ ...
REGULATED ARRAY GRAMMARS OF FINITE INDEX Part I: Theoretical Investigations Henning FERNAU Markus HOLZER

Wilhelm-Schickard-Institut fur Informatik, Universitat Tubingen, Sand 13, D-72076 Tubingen, Germany

[email protected] [email protected]

Rudolf FREUND Institut fur Computersprachen, Technische Universitat Wien, Resselg. 3, A-1040 Wien, Austria [email protected]

Abstract. We consider regulated (n-dimensional) context-free array

grammars of nite index, i.e., with a limited number of active areas. This research is motivated by the observation that in several practical applications of array grammars only such a bounded number of active areas is observed. Combined with di erent methods known from regulated rewriting (e.g., matrix grammars, programmed grammars, grammars with prescribed teams), this natural restriction to a limited number of non-terminal symbols occurring in the arrays during a derivation gives rise to various di erent families of array languages (depending on the type of context-free array productions used in the regulated array grammars working with various types of the nite index restriction). Moreover, in the second part we shall present a characterization of some of these families de ned by regulated context-free array grammars working with a nite index restriction via special k-head nite array automata, which can be used in the eld of character recognition.

1 Introduction Picture processing is one of the major areas of applied informatics. In order to understand this subject more thoroughly, frameworks of formal studies are needed. One kind of these frameworks takes its ideas from mechanisms well known in formal language theory. Concerning di erent approaches to such syntactical picture processing, we refer to [17, 19, 25, 29] and the literature quoted there. In this paper, we will concentrate on array grammars as syntactic formalism to describe pictures. One particular area in the eld of picture processing is the recognition of handwritten characters. We will report on the use of array grammars for this purpose in the second part of this paper. In the rst part, we back our practical investigations by detailed theoretical studies. These studies are not l'art pour l'art, they are directly in uenced by observations made during the implementation of a tool for character recognition based on regulated array grammars:

One of the main observations is that the number of active working areas (which formally correspond to non-terminal symbols occurring in an array sentential form and might be processed in parallel when simulating an array grammar) is quite limited when generating a certain character by means of an array grammar. This naturally leads to the consideration of bounded parallelism within array grammars. Such studies have been initiated by the authors in [6, 7]. The feature of bounded parallelism can nicely be formulated in terms of cooperating/distributed array grammar systems [1, 4] with prescribed teams [13, 20, 24]. On the other hand, a limitation on the number of active working areas resembles very much the nite index feature well-known from formal language theory. For more details on the nite index restriction and its relatives, we refer the reader to [3, 10]. Especially, let us mention that regulated string grammars with the nite index restriction can be parsed rather eciently, which is a very important feature when trying to apply such methods for character recognition; hence, one of the aims of our present studies is to carry such nice properties over to array grammar parsing. Let us mention that we show, in passing, the natural correspondence between cooperating/distributed grammar systems with prescribed teams of bounded parallelism and absolutely parallel grammars as introduced by Rajlich [23], which also holds in the string case, although this will not be stated below. Part I of our paper is organized as follows: After introducing the necessary notions and stating some well-known facts in the next section, in the third section we introduce several control mechanisms for string grammars as well as for array grammars. In the fourth section we focus on the in uence of (various versions of) the nite index restriction in regulated array grammars as well as in array grammar systems with prescribed teams; in particular, we are interested in the generative power of such devices. In the last section, we provide a list of open problems in the area as stimulus for further research.

2 Preliminaries In the main part of this section, we will introduce the de nitions and notations for arrays and sequential array grammars [2, 11, 15, 25, 30] and give some explanatory examples and well-known results. First, we recall some basic notions from the theory of formal languages (for more details, the reader is referred to [28]). For an alphabet V , by V  we denote the free monoid generated by V under the operation of concatenation; jxj denotes the length of a string; the empty string is denoted by , and V  n fg is denoted by V + . Any subset of V + is called a -free (string) language. A (string) grammar is a quadruple G = (VN ; VT ; P; S ), where VN and VT are nite sets of non-terminal and terminal symbols, respectively, with VN \ VT = ;, P is a nite set of productions ! with 2 V + and 2 V  , where V = VN [ VT , and S 2 VN is the start symbol. For x; y 2 V  we say that y is directly

derivable from x in G, denoted by x =)G y, if and only if for some ! in P and u; v 2 V  we get x = u v and y = u v. Denoting the re exive and transitive closure of the derivation relation =)G by =)G , the (string) language generated by G is L(G) = fw 2 VT j S =)G wg. A production ! is called { monotonic, if j j  j j ; { context-free, if 2 VN , { linear, if it is context-free and contains at most one non-terminal symbol, too, { regular, if 2 VN and 2 VT [ VT VN . A grammar is said to be of type enum, mon, cf , lin, reg, if every production in P is an arbitrary, monotonic, context-free, linear or regular production, respectively. The families of -free (string) languages generated by grammars of type enum, mon, cf , lin, reg, are denoted by L (enum), L (mon), L (cf ), L (lin), and L (reg), respectively. The following relations are known as the Chomskyhierarchy [28]: L (reg) $ L (lin) $ L (cf ) $ L (mon) $ L (enum).

We now are going to elaborate the notions for n-dimensional arrays and n-dimensional array grammars. Let Z denote the set of integers, let N denote the set of positive integers, N = f1; 2; : : :g, and let n 2 N . Then an n-dimensional array A over an alphabet V is a function A : Z n ! V [ f#g, where shape (A) = fv 2 Z n j A (v) 6= #g is nite and # 2= V is called the background or blank symbol. We usually shall write A = f(v; A (v)) j v 2 shape (A)g. The set of all n-dimensional arrays over V shall be denoted by V n . The empty array in V n with empty shape shall be denoted by n . Moreover, we de ne V +n = V n n fn g. Any subset of V +n is called a -free n-dimensional

array language. Let v 2 Z n. Then the translation v : Z n ! Z n is de ned by v (w) = w + v for all w 2 Z n, and for any array A 2 V n we de ne v (A), the corresponding n-dimensional array translated by v, by (v (A)) (w) = A (w ? v) for all w 2 Z n . The vector (0; : : : ; 0) 2 Z n is denoted by n . Usually [2, 25, 30, 31], arrays are regarded as equivalence classes of arrays with respect to linear translations, i.e., only the relative positions of the symbols 6= # in the plane are taken into account: The equivalence class [A] of an array A 2 V n is de ned by [A] = fB 2 V n j B = v (A) for some v 2 Z n g. The set of all equivalence classes of n-dimensional arrays over V with respect to linear translations shall be denoted by [V n ] etc. Most of the results elaborated in this paper immediately carry over from the families of array languages we consider to the corresponding families of array languages with respect to linear translations, therefore, in general we shall not consider these families of array languages with respect to linear translations explicitely in the following anymore. In order to be able to de ne the notion of connectedness of n-dimensional arrays, we need the following de nitions:

An (undirected) graph g is an ordered pair (K; E ), where K is a nite set of nodes and E is a set of undirected edges fx; yg with x; y 2 K . A sequence of di erent nodes x0 ; x1 ; : : : ; xm , m 2 N , is called a path of length m in g with the starting-point x0 and the ending-point xm , if for all i with 1  i  m an edge fxi?1 ; xi g in E exists. A graph g is said to be connected, if for any two nodes x; y 2 K , x 6= y, a path in g with starting point x and ending point y exists. Observe that a graph (fxg ; ;) with only one node and an empty set of edges is connected, too. Let W be a non-empty nite subset of Z n . For any k 2 N [ f0g, a graph gk (W ) = (W; Ek ) can be assigned to W such that Ek for v; w 2 W contains the edge fv; wg if and only if 0 < kv ? wk  k, where the norm kuk of a vector u 2 Z n , u = (u (1) ; : : : ; u (n)), is de ned by kuk = max fju (i)j j 1  i  ng. Then, W is said to be k-connected if gk (W ) is a connected graph. Observe that W is 0-connected if and only if card (W ) = 1, where card (W ) denotes the number of elements in the set W . Now let V be a nite alphabet and A an n-dimensional array over V , A 6= n . Then A is said to be k-connected if gk (shape (A)) is a connected graph. Obviously, if A is k-connected then A is m-connected for all m > k, too. The norm of A is the smallest number k 2 N [ f0g such that A is k-connected, and is denoted by kAk. Observe that kAk = 0 if and only if card (shape (A)) = 1. An n-dimensional array production p over V is a triple (W; A1 ; A2 ), where W  Z n is a nite set and A1 and A2 are mappings from W to V [ f#g; p is called -free if shape (A2 ) 6= ;, where we de ne shape (Ai ) = fv 2 W j Ai (v) 6= #g ; 1  i  2. The norm of the n-dimensional array production (W; A1 ; A2 ) is de ned by k(W; A1 ; A2 )k = max fkv ? wk j v; w 2 W g. We say that the array C2 2 V n is directly derivable from the array C1 2 V n by the n-dimensional array production (W; A1 ; A2 ) if and only if there exists a vector v 2 Z n such that C1 (w) = C2 (w) for all w 2 Z n n v (W ) as well as C1 (w) = A1 (?v (w)) and C2 (w) = A2 (?v (w)) for all w 2 v (W ), i.e., the subarray of C1 corresponding to A1 is replaced by A2 , thus yielding C2 ; we also write C1 =)p C2 . As can already be seen from the de nitions of an n-dimensional array production, the conditions for an application to an n-dimensional array B and the result of an application to B, an n-dimensional array production (W; A1 ; A2 ) is a representative for the in nite set of equivalent n-dimensional array productions of the form (v (W ) ; v (A1 ) ; v (A2 )) with v 2 Z n . Hence, without loss of generality, in the sequel we usually shall assume n 2 W as well as A1 ( n ) 6= #. Moreover, we often will omit the set W , because it is uniquely reconstructible from the description of the two mappings A1 and A2 by Ai = f(v; Ai (v)) j v 2 W g ; 1  i  2. Thus in the sequel we will represent the n-dimensional array production (W; A1 ; A2 ) also by writing A1 ! A2 , i.e., f(v; A1 (v)) j v 2 W g ! f(v; A2 (v)) j v 2 W g. An n-dimensional array grammar is a sextuple

G = (n; VN ; VT ; #; P; f(v0 ; S )g) ;

where VN is the alphabet of non-terminal symbols, VT is the alphabet of terminal symbols, VN \ VT = ;, # 2= VN [ VT ; P is a nite non-empty set of n-dimensional array productions over VN [ VT and f(v0 ; S )g is the start array (axiom), v0 is the start vector, and S is the start symbol. G is called -free if every production in P is -free. We say that the array B2 2 V n is directly derivable from the array B1 2 V n in G, denoted B1 =)G B2 , if and only if there exists an n-dimensional array production p = (W; A1 ; A2 ) in P such that B1 =)p B2 . Let =)G be the re exive transitive closure of =)G. Then the (n-dimensional) array language generated by G, L (G), is de ned by L (G) = fA j A 2 VTn ; f(v0 ; S )g =)G Ag. The norm of the n-dimensional array grammar G is de ned by kGk = max fkpk j p 2 P g. An n-dimensional array production p = (W; A1 ; A2 ) in P is said to be

{ { { { { {

monotonic, if shape (A1 )  shape (A2 ); strictly monotonic or non-blank-sensing, if shape (A2 ) = W ; #-context-free, if card (shape (A1 )) = 1; context-free, if p is monotonic as well as #-context-free; moreover, we usually assume A1 ( n ) 2 VN ; if card (W ) = 1, we only write A1 ( n ) ! A2 ( n ); strictly context-free, if p is strictly monotonic as well as context-free; regular, if either 1. W = f n ; vg for some v 2 P Un , where Un = f(i1 ; : : : ; in) j nk=1 jik j = 1g, and A1 = f( n ; B ) ; (v; #)g, A2 = f( n ; a) ; (v; C )g, with B; C 2 VN and a 2 VT , or 2. W = f n g, A1 = f( n ; B )g, A2 = f( n ; a)g, with B 2 VN and a 2 VT (in this case we also write B ! a).

G is called an array grammar of type n-enum, n-mon, n-smon, n-#-cf , ncf , n-scf , n-reg, if every array production in P is of the corresponding type, i.e., an arbitrary (n-enum), monotonic (n-mon), strictly monotonic (n-smon), #-context-free (n-#-cf ), context-free (n-cf ), strictly context-free (n-scf ), and regular (n-reg), respectively, n-dimensional array production; the corresponding families of -free n-dimensional array languages are denoted by L (X ), X 2 fn-enum; n-mon; n-smon; n-#-cf; n-cf; n-scf; n-regg. If for X; Y 2 fn-enum; n-mon; n-smon; n-#-cf; n-cf; n-scf; n-reg j n  1g every array production of type X is also an array production of type Y , we write X  Y . Obviously, if X  Y , then L (X )  L (Y ), too. In the following, we also consider the case of context-free and strictly contextfree n-dimensional array grammars with norm 1; for the types X with X 2 fn-cf; n-scf g we denote the corresponding families of array languages generated by such array grammars by L (X1 ). Like in the string case, some of the families of array languages de ned above form a strict hierarchy (see [2, 30]; also compare with the results stated in [11, 12, 15]), the so-called Chomsky-hierarchy of array languages: For all n 2 N ,

L (n-reg) $ L (n-scf ) $ L (n-smon) $ L (n-enum) :

An interesting feature of n-dimensional array grammars is the fact that even regular and context-free array productions make use of some special context, namely the context of blank symbols #. This blank-sensing ability (which is reduced to a minimum in the case of strictly context-free respectively strictly monotonic array grammars in contrast to context-free respectively monotonic array grammars) induces a relatively high generating power even of only regular two-dimensional array grammars and yields some rather astonishing results, e.g., the set of all solid squares can be generated by a regular two-dimensional array grammar [31]. Furthermore, both xed and general membership for regular and context-free n-dimensional array grammars, even in the unary case, is NP-complete, while the non-emptiness problem is undecidable for regular n-dimensional array grammars, if n  2 (see [8, 22]). This result nicely ts into the picture observed in other picture describing formalisms [17, 21, 29]. For one-dimensional array grammars, in contrast we have [L (1-reg)] = [L (1-cf1 )], which family of array languages directly corresponds with the family of regular string languages (see [14]). Moreover, every recursively enumerable string language can be represented by an at least two-dimensional #-context-free array grammar (see [8]), so that the membership problem for this grammar and language type is undecidable, while it is PSPACE-complete for monotonic array grammars and languages. As many results for n-dimensional arrays for a special n can be taken over immediately for higher dimensions, we introduce the following notion: Let n; m 2 N with n  m. For n < m, the natural embedding in;m : Z n ! m Z is de ned by in;m (v) = (v; m?n ) for all v 2 Z n; for n = m we de ne in;n : Z n ! Z n by in;n (v) = v for all v 2 Z n . To an n-dimensional array A 2 V +n with A = f(v; A (v)) j v 2 shape (A)g we assign the m-dimensional array in;m (A) = f(in;m (v) ; A (v)) j v 2 shape (A)g.

3 Control mechanisms In the following, we give the necessary de nitions of matrix and programmed (graph controlled) string and array grammars and languages. For detailed informations concerning these control mechanisms as well as many other interesting results about regulated rewriting in the theory of string languages, the reader is referred to [3]. A matrix (string) grammar is a construct GM = (VN ; VT ; (M; F ) ; S ) where VN and VT are disjoint alphabets of non-terminal respectively terminal symbols, S 2 VN is the start symbol, M is a nite set of matrices, M = fmi j 1  i  ng, where the matrices mi are sequences of the form mi = (mi;1 ; : : : ; mi;ni ), ni  1, 1  i  n, and the mSi;j , 1  j  ni , 1  i  n, are productions over VN [ VT , and F is a subset of 1in; 1jni fmi;j g. For mi = (mi;1 ; : : : ; mi;ni ) and v; w 2 (VN [ VT ) we de ne v =)mi w if and only if there are w0 ; w1 ; : : : ; wni 2 (VN [ VT ) such that w0 = v; wni = w, and for each j; 1  j  ni ,

{ either wj is the result of the application of mi;j to wj? , { or mi;j is not applicable to wj? ; wj = wj? , and mi;j 2 F . 1

1

1

The language generated by GM is

L (GM ) = fw 2 VT j S =)mi1 w1 : : : =)mik wk = w; wj 2 (VN [ VT ) ; mij 2 M for 1  j  k : If F = ; then GM is called a matrix (string) grammar without appearance checking. A graph controlled (string) grammar or programmed (string) grammar is a construct GP = (VN ; VT ; (R; Lin; Lfin ) ; S ); VN and VT are disjoint alphabets of non-terminal and terminal symbols, respectively; S 2 VN is the start symbol; R is a nite set of rules r of the form (l (r) : p (l (r)) ;  (l (r)) ; ' (l (r))), where l (r) 2 Lab (GP ), Lab (GP ) being a set of labels associated (in a one-to-one manner) to the rules r in R, p (l (r)) is a string production over VN [ VT ,  (l (r))  Lab (GP ) is the success eld of the rule r, and ' (l (r)) is the failure eld of the rule r; Lin  Lab (GP ) is the set of initial labels, and Lfin  Lab (GP ) is the set of nal labels. For r = (l(r) : p (l (r)) ;  (l (r)) ; ' (l (r))) and v; w 2 (VN [ VT ) we de ne (v; l (r)) =)GP (w; k) if and only if

{ either p (l (r)) is applicable to v, the result of the application of the production p(l(r)) to v is w, and k 2  (l (r)), { or p (l (r)) is not applicable to v, w = v, and k 2 ' (l (r)). The (string) language generated by GP is

L (GP ) = fw 2 VT j (S; l0 ) =)GP (w1 ; l1 ) =)GP : : : (wk ; lk ) ; k  1; wj 2 (VN [ VT ) and lj 2 Lab (GP ) for 0  j  k; wk = w; l0 2 Lin ; lk 2 Lfin g : If the failure elds ' (l (r)) are empty for all r 2 R, then GP is called a graph controlled (or programmed) grammar without appearance checking. If ' (l (r)) =  (l (r)) for all r 2 R, then GP is called a graph controlled (or programmed) grammar with unconditional transfer. A matrix (string) grammar, or a graph controlled (string) grammar, respectively, is said to be of type enum, mon, cf , or reg, respectively, if every production appearing in this grammar is of the corresponding type, i.e., an arbitrary, monotonic, context-free, or regular production, respectively. For X 2 fenum; mon; cf; regg, by

Mac (X ) ; M (X ) ; Pac (X ) ; Put (X ) ; P (X ) ; we denote the -free (string) languages generated by matrix grammars, ma-

trix grammars without appearance checking, programmed (or graph controlled) grammars, programmed (or graph controlled) grammars with unconditional

transfer, and programmed (or graph controlled) grammars without appearance checking, respectively, containing only productions of type X . The de nitions of matrix grammars and graph controlled grammars can immediately be taken over for array grammars by taking array productions instead of string productions in the de nitions given above: A matrix array grammar or a graph controlled array grammar is said to be of type X if every array production appearing in this grammar is of the corresponding type X , too; for every X 2 fn-enum; n-mon, n-smon, n-#-cf , n-cf , n-scf , n-reg j n  1g, by Mac (X ), M (X ), Pac (X ), Put (X ), P (X ), we denote the -free array languages described by matrix array grammars, matrix array grammars without appearance checking, programmed (or graph controlled) array grammars, programmed (or graph controlled) array grammars with unconditional transfer, and programmed (or graph controlled) array grammars without appearance checking, respectively, containing only productions of type X . Usually, the number of non-terminal symbols occurring in the sentential forms of a derivation is not bounded, yet a natural measure for the complexity of the evolving terminal object. Even in some applications as character recognition we can restrict ourselves to a quite low bound of non-terminal symbols occurring in the sentential forms of a derivation as we shall see in the second part. Hence, we introduce the following de nitions: The index of a derivation D of a terminal object w (string or array) in a (string or array) grammar G is de ned as the maximal number of non-terminal symbols occurring in an intermediate sentential form and denoted by indG;D (w). This de nition also makes sense for programmed and matrix (string or array) grammars with or without appearance checking etc.; for matrix grammars we shall also demand that also the objects obtained during the application of a matrix are considered, not only the objects resulting after the application of a whole matrix. Moreover, by indG;min (w) and indG;max (w) we denote the minimum and the maximum, respectively, of the set findG;D (w) j D is a derivation of w in Gg. The corresponding index of the grammar itself is de ned by

indY (G) = sup findG;Y (w) j w is generated by Gg ; Y 2 fmin; maxg.

For a given (string or array) language L and a class of (string and array, respectively) grammars of type X (X 2 fcf g [ fn-#-cf; n-cf; n-scf j n  1g) we de ne the Y -index of L with respect to X by

indY (L; X ) = min findY (G) j L is generated by G and G is of type X g : These de nitions can immediately be taken over for string and array grammars with control mechanisms, too. Hence, for Z 2 fL; Mac ; M; Pac ; Put ; P g, by Z Y;dfine (X ) and Z Y;dke (X ) we denote the family of all (string and array, respectively) languages L from Z (X ) which are generated by a corresponding

grammar of type X or regulated grammar of type X with indY (L; X ) being nite or indY (L; X )  k, respectively. In this paper, we shall investigate array languages obtained by regulated array grammars with a nite index restriction in more detail. Another natural de nition for a nite index restriction is to consider only those objects w generated by the (regulated) grammar G that have a derivation with indG;Y (w)  k for Y 2 fmin; maxg; for X 2 fcf g [ fn-#-cf; n-cf; n-scf j n  1g and Z 2 fL; Mac; M; Pac ; Put ; P g the corresponding families of (string or array languages) are denoted by Z Y;\dke (X ) and the unions over all k  1 are denoted by Z Y;\dfine (X ). Observe that in the literature a grammar G is said to be of index k if indG;min  k, while a grammar G with indG;max  k is called (k-) derivation-bounded, see [18], or of uncontrolled index k, see [26]. The families Z Y;\dke (X ) have not been considered elsewhere, so that we will also obtain new results in the string case for these families. Another important control mechanism is the formation of prescribed teams (e.g., see [24]). As we shall restrict ourselves to the case of nite indices, several di erent de nitions of team formation are to be considered: Let M be an arbitrary set; then any object f(x; nx ) j x 2 M g with nx being a natural number for every x 2 M is called a multiset over M . Now let G = (VN ; VT ; P; S ) be a context-free string grammar. Any nite multiset r over P is called a prescribed team over P ; we shall also write r in the form hp1 ; : : : ; pm i, where the pi , 1  i  m, are exactly the productions in r occurring in the right multiplicity (observe that obviously any permutation of the pi in this representation describes the same multiset). Writing the context-free productions pi in the form Ai ! wi the application of the team hp1 ; : : : ; pmi to a string u0 A1 u1 : : : Am um yields the new string u0 w1 u1 : : : wm um , i.e., all the productions in the multiset are applied in parallel to the underlying sentential form. Now let F  P ; then hp1 ; : : : ; pmi can be applied in the so-called appearance checking mode in such a way that any of the pi appearing in F can be skipped; e.g., let hA ! a; B ! Qi be a team over P and B ! Q 2 F ; then this team can be applied to the string aAbb in the appearance checking mode yielding aabb, whereas the application to aBAbb yields aQabb. A string grammar with prescribed teams is a construct

Gt = (VN ; VT ; (P; R; F ) ; S ) ; where G = (VN ; VT ; P; S ) is a context-free grammar, R is a nite set of teams over P , and F  P is the set of productions that can be skipped in the appearance checking mode. If F is empty, then we call Gt a string grammar with prescribed teams without appearance checking. By the de nition of the application of a prescribed team to an underlying string given above we immediately obtain the de nition of a derivation relation =)Gt assigned to Gt . The language generated by Gt is the set of all terminal strings obtained from the start symbol S using the derivation relation =)Gt :

In the case of n-dimensional array productions we have the problem how to de ne the parallel application of these productions, because the chosen domains of the array productions might overlap. We could choose that these domains must not overlap at all as this was done in [13]; on the other hand, we could restrict ourselves to demand that at overlapping positions the result of the application of all the productions must be equal. The de nition we shall choose is an intermediate variant, where we demand that only at those positions where a non-blank symbol will result the domains of the chosen subarrays must not overlap, whereas we allow the sensing for a blank symbol at a special position from di erent positions. In the case of n-dimensional strictly context-free array productions, this de nition falls together with the rst variant proposed above, because in this case at every position a ected by one of the array productions in the team a non-blank symbol will result. As in this paper we restrict ourselves to sentential forms with a bounded number of non-terminal symbols, in the following we only consider the application of a team with the nite index restriction, i.e., the application of a team is only allowed if all non-terminal symbols occurring in the sentential form are a ected by the team, at least in the appearance checking mode. This guarantees that in a derivation yielding a terminal object the number of non-terminal symbols occurring in the intermediate sentential forms can never exceed the maximal number of context-free productions occurring in any of the teams in R. Given a (string or array) grammar with prescribed teams G, by Ldke (G) we denote the language generated by G working with the nite index restriction and having the number of non-terminal symbols bounded by k: In this paper, we only consider teams with the usual derivation mode , which resembles the control mechanism of unscattered context productions (see [3]), because we want to restrict ourselves to the nite index restriction; in this situation other derivation modes like the maximal derivation mode or the derivation modes of making at least k or exactly equal k steps with a team (as they were considered for the string case in [24] and for the array case in [13]) seem to be of minor importance. The family of all string languages generated by string grammars with prescribed teams of context-free productions with the nite index restriction is denoted by PTacdke (cf ). The family of all array languages generated by array grammars with prescribed teams of array productions of type X , X 2 fn-#-cf; n-cf; n-scf j n  1g, is denoted by PTacdke (X ). For denoting the families being the unions for all k  1, we replace dke by dfine. In those cases where we do not allow appearance checking we simply omit the subscript ac. Let us nally give an example of a two-dimensional array grammar with prescribed teams that works with the nite index restriction and allows us to explain the appearance checking mode in this case of prescribed teams of arrays.

Example 1. Let G = (n; fD; E; L; Q; R; S; U g ; fag ; #; (P; R; F ) ; f((0; 0) ; S )g) be a two-dimensional array grammar with prescribed teams and



L # L ## aU P= # S # ! a D; L ! a ; D # ! a D; L ! a ; R D # ! a R; # R ! a ; U # ! a U ; U ! E; # ! Qa ; R ! a; E ! a ; R     L ; # ! L; D # ! a D ; R= # ! aD L a   S #  # # ! a U ; D # ! a R ; # ! R; U # ! a U ; a R a L  # Q U ! E; R ! a ; hR ! a; E ! ai   # Q F= R!a . The derivation sequence

aaa aaE aaU aU =)G a R =)G a R =)G a a S =)G La D =)G a aaa aaa aaa aa R shows the generation of the smallest object in the array language generated by

G; L (G) is the set of all squares of side lengths  3 with the left lower corner

lying in the origin.

aaU Observe that to the array a R the team U ! E; # !Q R a is only apaaa plicable by using # ! Q in the \appearance checking" mode. R





a

From [13] and [15], we know that L (G) 2 Mac (2-scf1 ) n M (2-scf1 ). Obviously, from the explanations given above we immediately infer L (G) = Ld2e (G) and therefore L (G) 2 PTacd2e (2-scf1 ) n M (2-scf1 ).

In fact, this example shows that the notion \appearance checking" in the general case, where not only strings are considered, should rather be replaced by the notion \applicability checking". In the string case, the applicability of a context-free production only depends on the occurrence of the non-terminal symbol on the left-hand side of the production in the underlying sentential form. On the other hand, in the array case, the availability of free (blank) positions for putting non-blank symbols is an additional necessary applicability condition for (#-) context-free array productions. As both notions can be abbreviated by \ac", in the following we shall only use this abbreviation instead of introducing new notions like \applicability checking" for \appearance checking"; yet the reader should always keep in mind this important di erence between the string case and the array case.

4 The nite index restriction The following lemma immediately follows from the de nitions given in the preceding section: Lemma 1. For any X 2 fcf g[fn-#-cf; n-cf; n-scf j n  1g, for every k  1, 1. Z Y;dke (X )  Z Y;dfine (X ) for every Z 2 fL; Mac; M; Pac ; Put ; P g and for every Y 2 fmin; maxg, 2. PT dke (X )  PT dfine (X ) and PTacdke (X )  PTacdfine (X ). The following lemma is an obvious consequence of the de nitions, too: Lemma 2. For any X 2 fcf g [ fn-#-cf; n-cf; n-scf j n  1g, for every k 2 ffing [ fi j i  1g, and for every Z 2 fL; Mac; M; Pac ; Put ; P g, 1. 2.

Z max;dke (X )  Z max;\dke (X ); Z max;dke (X )  Z min;dke (X )  Z min;\dke (X ).

For the control mechanisms of matrix grammars and graph controlled grammars, the following inclusions are well-known; in fact, they also hold true for the the unrestricted case: Lemma 3. For any X 2 fcf g [ fn-#-cf; n-cf; n-scf j n  1g, for every k 2 ffing [ fi j i  1g, for every Y 2 fmin; maxg, 1. M Y;dke (X )  MacY;dke (X ), P Y;dke (X )  PacY;dke (X ), and PutY;dke (X )  PacY;dke (X ); 2. MacY;dke (X )  PacY;dke (X ), and LY;dke (X )  M Y;dke (X )  P Y;dke (X ). Proof. Only the inclusions in the second point do not directly follow from the de nitions; they constitute relations holding true in general for these types of control mechanisms as shown in [5]. 2 In the string case, where the applicability of a context-free production only depends on the occurrence of the non-terminal symbol on the left-hand side of the production, appearance checking gives no additional power, because we can use the following \memory trick" remembering all the maximal k variables occurring in the sentential form. As this technique will be essential for several proofs given in this paper, we shall exhibit it in some details for the case of graph controlled (string) grammars with and without ac. Lemma 4. The following families of languages coincide:

{ P min;dke (cf ), Pacmin;dke (cf ), Putmin;dke (cf ),

{ P min;\dke (cf ), Pacmin;\dke (cf ), Putmin;\dke (cf ), { P max;dke (cf ), Pacmax;dke (cf ), Putmax;dke (cf ). Proof. By our preceding lemmas, it remains to show the relations

1. 2.

Pacmin;\dke (cf )  P max;dke (cf ); Pacmin;\dke (cf )  Putmax;dke (cf ).

In the following, we only show the rst of these relations in order to introduce the necessary proof technique. The other relation is shown in an analogous way. Let G = (VN ; VT ; (R; Lin ; Lfin ) ; S ) be a graph controlled (string) grammar with ac. Let Lmin;\dke (G) denote the language containing all strings w from L (G) such that there is a derivation in G yielding w where all the intermediate substrings contain at most k non-terminal symbols.  now construct a graph  We  0 0 0 0 0 0 controlled grammar G = VN ; VT ; R ; Lin ; Lfin ; S without ac such that G0 generates Lmin;\dke (G) with the nite index restriction, i.e., no valid derivation in G0 yields a sentential form containing more than k non-terminal symbols. We use k copies A(1) ; : : : ; A(k) of each variable A from VN such that we can assume all variables occurring in the sentential form to be di erent, i.e., we can use sets instead of multisets. This can be achieved in the following way: Let l : A ! w be the labelled production to be applied. Then we look for an A(i) occurring in the current set of variables. If no A(i) is there, then we can pass to a node from ' (l) in the appearance checking mode. Otherwise we choose one of the A(i) occurring in the underlying sentential form and simulate the application of A ! w in such a way that A(i) is replaced by some instance of w, where each variable B in w is replaced by a B (j) such that the resulting string again contains only pairwise disjoint variables. Taking into account all these considerations, we obtain:  VN00 = A(i) j A 2 VN ; 1  i  k ,   VN0 = A(i) ; M j A 2 VN ; 1  i  k; M  VN00 [ fS 0 g; R0 contains the following rules: ?





 







1. l : S 0 ! S (1) ; S (1) ; hl; 1; S (1)i ; ; hfor l 2 Lini;    2. [l; i; M ] : A(i) ; M ! u0 B1(j1 ) ; M u1 : : : Bm(jm ) ; M um ; L0 ; ; for l 2 Lab (G); 1  i  k; M  VN00 ; nh io  M 00 = M n A(i) , A(i) 2 M , M 0 = M 00 [ Bh(jh ) j 1  h  m , n o 1  card (M )  k, M 00 \ Bh(jh ) j 1  h  m = ;, Bx(jx ) 6= By(jy ) for all x; y with x; y 2 f1; : : : ; mg and x 6= y; (l : A ! u0 B1 u1 : : : Bm um;  (l) ; ' (l)) 2 R, uj 2 VT , 0  j  m; Bh 2 VN , 1  h  m; L0 6= ; if and only if 1  card (M 0 )  k and one of the following two cases occurs:

L0 = f[l; l0; M; M 0 ; 1] j l0 2  (l)g, if M 0 6= ;; L0 = L0fin if M 0 = ; as well as { Lfin \  (l) 6= ; or { Lfin \  (l) = ; as well as  (l) 6= ; and there is a sequence l1; :::; le ; e  1; li 2 Lab (G) ; 1  i  e; such that l1 2  (l) ; li 2 ' (li?1 ) for 1 < i  e and le 2 Lfin () : 3. After the simulation of the production labelled by l, in all the variables of the newly derived sentential form M has to be replaced by M 0 ; for the representation of M 0 we choose an alphabetical order on VN00 , i.e., for M 0 = fXi j 1  i  mg we can take the sequence hX1 ; : : : ; Xm i as its unique (a) (b)

representation: ([l; l0; M; M 0; h] : [Xh ; M ] ! [Xh ; M 0] ; f[l; l0; M; M 0; h + 1]g ; ;) for 1  h < m and ([l; l0; M; M 0; m] : [Xm ; M ] ! [Xm ; M 0 ] ; f[l0 ; i; M 0] j 1  i  kg \ Lab0 ; ;) for l; l0 2 Lab (G), l0 2  (l); ; $ M  VN00 ; ; $ M 0  VN00 . 4. The appearance checking mode is simulated by the following rules: ([l; i; M ] : [B; M ] ! [B; M ] ; f[l0 ; j; M ] j 1  j  k; l0 2 ' (l)g \ Lab0 ; ;), 00 l 2 Lab  (G) ;(j1)  i  k , M  VN , B 2 M , and A 2= X j X 2 M for some j with 1  j  k , where (l : A ! w;  (l) ; ' (l)) 2 R. The condition () stated above guarantees that M 6= ; whenever we have to simulate the ac mode in G0 : Moreover, we de ne L0in = Lin , L0fin = ff g, and

Lab0 = f[l; i; M ] j (l : A ! w;  (l) ; ' (l)) 2 R; 1  i  k; M  VN00 ; A(i) 2 M [ 00 f[l; i; M ] j (l : A! w;  (l) ; ' (l)) 2 R; 1  i  k; M  V N; (j ) A 2= X j X 2 M for some j with 1  j  k : Accoording to the constructions given above, labels of the form [l; i; M ] for (l : A ! w;  (l) ; ' (l)) 2 R only \make sense" if { A(i) 2 M makes the rule labelled by l applicable to the current sentential form, or

{ A 2= X j X j 2 M for some j with 1  j  k allows us to skip the rule ( )

labelled by l in the ac mode. The reader should also observe that when we reach a rule of the form (l : p (l) ; ;; ;) 2 R0 in a derivation in G0 ; the underlying sentential form is blocked and cannot yield a terminal string anymore. As the construction given above shows, ac is not needed when we restrict ourselves to a given maximal number of variables occurring in the sentential forms of the derivations possible in G0 , because the \memory trick" allows us to keep track of the current variables and the application of a context-free production only depends on the occurrence of the variable on the left-hand side of the production in the underlying strings. These observations complete the proof. 2

Observe that the equalities P min;dke (cf ) = Pacmin;dke (cf ) = P max;dke (cf ) = Pacmax;dke (cf ) have already been shown in [3, 26]. The corresponding cases of unconditional transfer have been treated in [9, 27]. When considering array grammars, we have to distinguish between the onedimensional case, which often can be compared with the string case, and the n-dimensional case for n  2, where we often obtain quite di erent results. The \memory trick" used in the preceding lemma forms the basis of the proof given for the following normal form for (array) grammars with prescribed teams working with the nite index restriction: Lemma 5. For every n-dimensional array grammar G = (n, VN , VT , #, (P; R; F ), f(v0 ; S )g) with prescribed teams of type X , X 2 fn-scf1, n-scf , ncf1 , n-cf , n-#-cf g, there exists an equivalent n-dimensional array grammar G0 = (n; VN0 , VT , #, (P 0 ; R0 ; F 0 ), f(v0 ; S 0 )g) with prescribed teams of type X such that Ldke (G0 ) = Ldke (G) and in every derivation in G0 the underlying array contains at most k di erent variables from VN0 . If G is a grammar without ac, then G0 is without ac, too. Proof. Again instead of VN we use k copies A(1) ; : : : ; A(k) for each variable A in VN , i.e., we use VN00 = A(i) j A 2 VN ; 1  i  k : Let us rst consider the case without ac; i.e., we de ne VN0 = VN00 : For each team hp1 ; : : : ; pm i in R with pj 2 P , 1  j  m, where

pj = (Wj ; f( n ; Aj )g [ f(v; #) j v 2 Wj n f n gg ; f(v; Bj;v ) j v 2 Wj g) ; we take every team of the form hp01 ; : : : ; p0mi into R0 , where 

n

o

p0j = n Wj ; n ;A(jij ) [ f(v; #) j v 2 Wj nof n gg ; (ij;v ) j v 2 Wj ; Bj;v 2= VT [ f#g [ v; Bj;v f(v; Bj;v ) j v 2 Wj ; Bj;v 2 VT [ f#gg) for 1  j  m is chosen in such a way that the Bj;v with Bj;v 2= VT [ f#g (ij;v ) are represented by suitable copies Bj;v of Bj;v in such a way that all these (ij;v ) variables Bj;v are di erent. As we are working with the nite index restriction, in addition we can demand that in sum the total number of all these variables does not exceed the given bound k. Moreover, working with the nite index restriction also guarantees that the resulting team in G0 can only be applied if all variables in the underlying array are a ected by one of the array productions, i.e., a team with the \wrong" superscripts can never be applied. Hence, in this construction we even need not memorize the actual (multi)set of non-terminal symbols. Obviously, we obtain an array grammar G0 with prescribed teams being of the desired normal form and still having the same type X as G. When working in the ac mode, we non-deterministically have to choose the array productions we assume to be skipped in the ac mode. In order to guarantee

that these array productions really will have to be skipped we replace them by array productions introducing a trap symbol Q: Yet now again we have to memorize the variables in the underlying array, i.e., we de ne nh

i

o

VN0 = A(i) ; M j A 2 VN ; 1  i  k; M  VN00 [ fQg ; 





where Q is the trap symbol; we start with S 0 = S (1) ; S (1) : For each team hp1 ; : : : ; pm i in R with pj 2 P , 1  j  m, where

pj = (Wj ; f( n ; Aj )g [ f(v; #) j v 2 Wj n f n gg ; f(v; Bj;v ) j v 2 Wj g) as above, we now take every team of the form hp01 ; : : : ; p0m i into R0 , where for 1  j  m we either choose 

n

io

h

) ; M [ f(v; #) j v 2 Wj nof n gg ; p0j = n Wj ; h n ; A(jiji (ij;v ) 0 v; Bj;v ; M j v 2 Wj ; Bj;v 2= VT [ f#g [ f(v; Bj;v ) j v 2 Wj ; Bj;v 2 VT [ f#gg) in order to simulate pj or we assume pj 2 F to be simulated in the ac mode,

where we have to distinguish between the following two cases:

1. A(jij ) 2 M; but p0j as de ned above is not applicable because of overlapping domains; hence A(jij ) will also appear in the derived array and therefore A(jij ) 2 M 0 ; too; yet we have to guarantee that the array production we choose as p0j generates the trap symbol Q whenever being applied in contrast to our guess. Hence, we take 

n

h

io

p0j = Wj ; n ; A(jij ) ; M [ f(v; #) j v 2 Wj n f n gg ; f( n ; Q)g [ f(v; Q) j v 2 Wj n f n g ; Bj;v 2= VT [ f#gg [ f(v; Bj;v ) j v 2 Wj n f n g ; Bj;v 2 VT [ f#gg) h

i

2. A(jij ) 2= M ; the non-occurrence of the symbol A(jij ) ; M in the underh i lying array can easily be simulated by taking p0j = Aj(ij ) ; M ! Q: Obviously, in this case A(jij ) will only appear in M 0 if chosen as onei h (im;v ) ( ij ) of the Bm;v ; m 6= j: We have to remark that although Aj ; M does not occur in the underlying array, nethertheless we may have Aj 2  X j X (i) 2 M for some i with 1  i  k ; yet according to the nite index restriction for the applicability of a prescribed team this only means that the corresponding copy A(j ij ) , i0j 6= ij , has to be a ected by another array production p0m in the team (either by being replaced by the application of a simulating array production or by being a ected in the ac mode as described in case 1). 0

All the productions p0j de ned above for the ac mode are taken into F 0 : The (ij;v ) non-terminal symbols Bj;v have to be chosen in suchi a way that all these h (ij;v ) non-terminal symbols (appearing as variables Bj;v ; M 0 in the derived array) are di erent including all those non-terminal symbols Ai(jij ) from M having only h been a ected (as the corresponding variable A(jij ) ; M ) in the ac mode; they all together build up the new set M 0 : Moreover, we only take those prescribed teams hp01 ; : : : ; p0m i constructed according to the guidelines stated above into R0 for which card (M 0 )  k, which guarantees that any array resulting from an application of the prescribed team hp01 ; : : : ; p0mi contains at most k non-terminal symbols. In sum we obtain an array grammar G0 with prescribed teams still having the same type X as G such that in every array appearing in a possible derivation in G0 the number of non-terminal symbols is at most k: These observations complete the proof. 2 >From the construction given above, it is obvious that a similar result holds true for context-free (string) grammars with prescribed teams. Corollary 1. For X 2 fcf g [ fn-scf j n  1g and k 2 ffing [ fj j j  1g

PT dke (X )  M max;dke (X ) and PTacdke (X )  Macmax;dke (X ) : Proof. The parallel application of the context-free productions pj = Aj ! wj , 1  j  m, in the team hp1 ; : : : ; pm i can be simulated by the sequential application of the context-free productions in the corresponding matrix [p01 ; : : : ; p0m; p001 ; : : : ; p00m ], where for 1  j  m :

1. 2.

p0j = Aj ! A0j and p00j = A0j ! w.

This simple argument can be used if, according to the preceding lemma, without loss of generality we assume the given grammar with prescribed teams of type X to be in the normal form presented above. In the ac case, for pi 2 F we take p0i as well as p00i into F 00 . In a similar way, for a team of strictly context-free array productions hp1 ; : : : ; pm i with

pj = (Wj ; f( n ; Aj )g [ f(v; #) j v 2 Wj n f n gg ; f(v; Bj;v ) j v 2 Wj g) we take 1. 2.

p0j = Aj ! A0j and   ? ? p00j = Wj ; n ; A0j [ f(v; #) j v 2 Wj n f n gg ; f(v; Bj;v ) j v 2 Wj g . 2

For #-context-free array productions having at least the blank-sensing ability, only a weaker result can be established. Corollary 2. For X 2 fn-#-cf; n-cf j n  1g and k 2 ffing [ fj j j  1g ; PT dke (X )  M max;d2ke (X ) and PTacdke (X )  Macmax;d2ke (X ) : Proof. We use a similar technique as in the preceding corollary, yet in order to check the \parallel" non-overlapping of the resulting non-blank areas a ected by the blank-sensing array productions we have to add an additional checking sequence, which intermediately may double the number of non-terminal symbols: For a team hp1 ; : : : ; pm i with pj = (Wj ; f( n ; Aj )g [ f(v; #) j v 2 Wj n f n gg ; f(v; Bj;v ) j v 2 Wj g) ; we now take the matrix h

i

(4) (3) (2) (4) (3) (2) (1) ; p(1) 1 ; : : : ; p m ; p1 ; : : : ; p m ; p1 ; : : : ; p m ; p1 ; : : : ; p m

where 1. p(1) j !?A0j ; j =A  ? (2) Wj ; n ; A0j [ f(v; #) j v 2 Wj n f ngg ; 2. pj = ?

n ; A0j [ f(v; Bj;v ) j v 2 Wj n f n gg ; (3) 0 3. pj = Aj ! A0j ; and ? 0 ? 0   j v 2 Wj0 ; 4. p(4) j = Wj [ f n g ; n ; Aj [ (v; #)  f( n ; Bj; n )g [ (v; #) j v 2 Wj0 with Wj0 = fv j v 2 Wj n f n g ; Bj;v = #g. If after the sequential simulation of the pj each position sensed for being blank still contains the blank symbol, the sequential simulation of the team by the corresponding matrix is completed successfully. Observe that the index is possibly increased by a factor of two, since at the positions of the Aj; even if Aj is replaced by a terminal symbol, we have to keep an \arti cial" non-terminal symbol in order to check blank symbols at the end of the simulation. In the ac mode, we have to use a more dicult construction, because in this case there can be various possibilities for pj to be skipped in the ac mode, i.e., 1. the variable Aj does not occur in the underlying array, 2. the variable Aj occurs in the underlying array, but the production pj is applied in the ac mode only, because its domain overlaps with the domain of another array production in such a way that the application of pj is prohibited. (3) (2) Assuming the rst case, we also add matrices where p(1) j = p j = pj = 0 p(4) j = Aj ! Q, Q is a trap symbol, and these productions are taken into F to be skipped in the ac mode. For the second case, we add the matrices with

00 p(1) j = Aj ! Aj ; (2) 00 ! A00 ; pj = A  ? j ? j (3) pj = Wj ; n ; A00j [ f(v; #) j v 2 Wj n f n gg ; f( n ; Q)g [ f(v; Bj;v ) j v 2 Wj n f n gg) (4) 4. pj = A00j ! Aj :

1. 2. 3.

0 This time, only the third array production p(3) j is taken into F . The remaining details of the resulting matrix array grammar are obvious and therefore left to the reader. 2 Lemma 6. For X 2 fcf g[fn-#-cf; n-cf; n-scf j n  1g and every k 2 ffing[ fj j j  1g we have P min;dke (X )  PT dke (X ) and Pacmin;dke (X )  PTacdke (X ). Proof. For array grammars without the nite index restriction, a detailed proof for the inclusion M (X )  PT (X ), X 2 fn-scf; n-cf; n-#-cf j n  1g was already elaborated in [13]. Combining the ideas of this construction with the ideas exhibited in the proof of Lemma 4 (especially the \memory trick") immediately yields the desired results. The main idea is to use variables of the form [A; M; p], where A is a non-terminal symbol, M is the current set of non-terminal symbols in the underlying array, and p is the current state in the control graph of the given graph controlled array grammar. The (rather tedious) details are left to the reader. 2

Combining the results obtained so far, we obtain the following theorems:

Theorem 1. For every k 2 ffing [ fj j j  1g, we have

PT dke (cf ) = PTacdke (cf ) = Z Y;dke (cf ) = Z min;\dke (cf ) for all Z 2 fM; Mac; P; Put ; Pac g and Y 2 fmin; maxg. Theorem 2. For X 2 fn-#-cf; n-cf j n  1g, we have 1. PT dfine (X ) = Z Y;dfine (X ) = Z min;\dfine (X ) as well as Y;dfine (X ) = Z min;\dfine (X ) 2. PTacdfine (X ) = Zac ac for all Z 2 fM; P g and Y 2 fmin; maxg. Theorem 3. For every k 2 ffing [ fj j j  1g, we have 1. PT dke (X ) = Z Y;dke (X ) = Z min;\dke (X ) as well as Y;dke (X ) = Z min;\dke (X ) 2. PTacdke (X ) = Zac ac for all X 2 fn-scf; n-scf1 j n  1g ; Z 2 fM; P g ; and Y 2 fmin; maxg. For n  2 the inclusion PT dke (n-scf1 )  PTacdke (n-scf1 ) is strict. Proof. For k  2; the strictness of the inclusion follows from Example 1; for n > 2, we take the natural embedding of the two-dimensional array language considered there in the n-dimensional space. For k = 1; we can take the set of rectangles as described in [15]. 2

4.1 The one-dimensional case In the one-dimensional case, some rather astonishing results can be obtained which in some sense contradict the general results established for the n-dimensional case with n  2 :

Theorem 4. For X 2 f1-cf ; 1-scf g we have 1

1

PT d1e (X ) = PTacd1e (X ) = Z Y;d1e (X ) = Z min;\d1e (X ) = L (1-reg) for all Z 2 fM; Mac; P; Put ; Pac g and Y 2 fmin; maxg.

Proof. The main idea is based on the fact that every information necessary for checking the applicability of a one-dimensional array production can be memorized in the single non-terminal symbol occurring in the intermediate arrays of a derivation in a regulated one-dimensional array grammar with the restriction to the nite index 1. The details are rather obvious and left to the reader. 2

So far, in the literature no reasonable notions for linear array grammars could be found; the following interesting result exhibits a rst approach into this direction:

Theorem 5. For X 2 f1-cf ; 1-scf g and every k  2 we have 1

1

PT dke (X ) = PTacdke (X ) = Z Y;dke (X ) = Z min;\dke (X ) = PT d2e (X ) for all Z 2 fM; Mac; P; Put ; Pac g and Y 2 fmin; maxg. Moreover, PT d2e (X ) represents the family of string languages L (lin).

Proof. The main idea is based on the fact that every information necessary for checking the applicability of a one-dimensional array production can be memorized in the two non-terminal symbols occurring at the leftmost position and at the rightmost position of the intermediate arrays of a derivation in a regulated one-dimensional array grammar working with the restriction to the nite index 2; and the same even holds true for k  2, because the positions between the two ends of the array generated so far can only be changed by terminal rules of the form A ! a or by unit rules of the form A ! B with A; B 2 VN , a 2 VT . Hence, the terminal result at these intermediate positions can be guessed in advance, so we only need the two non-terminal symbols at the ends of the array, where all the necessary informations can be stored, because we have to memorize at most k non-terminal symbols. The details of the construction are rather obvious and left to the reader. Given a linear string grammar G, an array grammar G0 with prescribed teams using at most two non-terminal symbols can be constructed in such a way that for each string in L (G) a corresponding 1-connected array in L (G0 ) is derived in the reversed way. Moreover, for an array grammar G0 with prescribed teams using at most two non-terminal symbols a linear string grammar G can be constructed in such a way that each array derived in G0 represents a string from

L (G). The two constructions showing that PT d2e (1-scf1 ) represents the family of string languages L (lin) are mainly based on the ideas elaborated in [16], where

bidirectional sticker systems with bounded delay were shown to characterize

L (lin) : The equality PT d2e (1-scf1 ) = PT d2e (1-cf1 ) can be shown by using similar ideas as in [12], where L (1-scf1 ) = L (1-cf1 ) was shown. Hence, for further details the interested reader is referred to [16] and [12]. 2

4.2 Hierarchies in the more-dimensional case For n  2; in some contrast to Theorem 4, we have the following obvious result: Lemma 7. For all n  2; Z 2 fM; P g ; and Y 2 fmin; maxg, we have PT d e (n-scf ) = Z Y;d e (n-scf ) = Z min;\d e (n-scf ) = L (n-reg) $ 1

1

1

1

1

1

PTacd1e (n-scf1) = ZacY;d1e (n-scf1 ) = Zacmin;\d1e (n-scf1 )

Proof. Due to the results already stated in Theorem 3, it only remains to show that L (n-reg) = PT d1e (n-scf1) : Any array production in a strictly context-free array grammar with prescribed teams of array productions of norm 1 and with the restriction to index 1 can contain at most one non-terminal symbol on the right-hand side, i.e., the only non-regular array productions in such an array grammar are of the form A ! B; where A and B are non-terminal symbols. Yet such unit array productions do not add generative power, hence PT d1e (n-scf1 )  L (n-reg) : The reverse inclusion is obvious, too. 2

The following example allows us to establish a non-collapsing hierarchy in the n-dimensional case with n  2 : Example 2. For every k  2, the set Ck of all \combs" with k \cogs" of arbitrarily long, but equal lengths (cf. [4]) lies in PT dke (2-scf1 ). Ck is generated by the following two-dimensional array grammar with prescribed teams and working with the restriction to k non-terminal symbols: G = (2 ; fSi ; Ai j 1  i  kg ; fag ; #; (P; R; ;) ; f((0; 0) ; S1 )g) ;   # i P= S #!A a Si+1 ; Si+1 # ! a Si+1 ; Ai ! Aij 1  i < k [ i  # ! Ai ; A ! a j 1  i  k [ # ! Ak ;

A

a

i

Sk a 

A R = S # ! a 1 S ; S2 # ! a S2 ; A1 ! A1 ; : : : ; 1 2   # ! Ai Si # a Si+1 ; A1 ! A1 ; : : : ; Ai?1 ! Ai?1 ; hSi+1 ! aSi+1 ; A1 ! A1 ; : : : ; Ai ! Ai i ; :: : ;  # ! Ak ; A ! A ; : : : ; A ! A 1 k?1 k?1 ; a 1 Sk   # ! A1 ; : : : ; # ! Ak ; hA ! a; : : : ; A ! ai : 1 k A1 a Ak a  i

#

The reader can easily verify that every array derived in G can contain at most k variables, i.e., Si; A1 , . . . , Ai?1 ; and nally A1 ; . . . , Ak , and that

L (G) = ff((i ? 1; 0) ; a) j 1  i  bg[ f((ij ? 1; m) ; a) j 1  m  h; 1  j  kg j b  k; h  1; 0 = i1 < i2 < : : : < ik = bg where b is the length of the \comb" and h is the length of the \cogs".

The following example shows the derivation of a \comb" with three \cogs":

A1 A1 A1 S1 =)G a S2 =)G a a S2 =)G a a a S2 =)G A1 A2 A1 A2 A1 A2 A3 a a a a S3 =)G a a a a a S3 =)G a a a a a a =)G A1 A2 A3 a a a A1 A2 A3 a a a a a a a a a a a a =)G a a a a a a =)G

a a a a a a a a a aaaaaa

Theorem 6. For every k  1 and n  2 we have 1. PT dke (n-scf ) $ PT dk e (n-scf ) and 2. PTacdke (n-scf ) $ PTacdk e (n-scf ). 1

1

+1

+1

1

1

Proof. The proof is established by the array language Ck+1 constructed in the preceding example, where we showed Ck+1 2 PT dk+1e (n-scf1 ). A simple pumping argument shows that Ck+1 2= PTacdke (n-scf1 ) : If we can only use k variables in a derivation in an array grammar G with prescribed teams generating Ck+1 ; then before even starting the generation of the last \cog" of the \comb" one of the other \cogs" must have nished its generation. As the equal lengths of the \cogs" can become arbitrarily long, there must be a loop of repeated situations of the non-terminal symbols occurring on the other \cogs" which can be pumped whereas the length of the already nished \cog" cannot be changed anymore. 2

5 Conclusions This paper can be seen as a starting point for further investigations in this eld of n-dimensional array grammars equipped with control mechanisms and working with the nite index restriction. Especially the following questions are still open:

{ How do the hierarchies look like for the families of array languages PT dke (X ), PTacdke (X ), as well as for the other families Z Y;dke (X ) for X 2 fn-cf; n-#-cf g, Y 2 fmin; maxg, Z 2 fP; Pac ; M; Macg considered

in this paper (we just mention that the sets of \combs" Ck constructed in Example 2 do not separate these families of array languages, because Ck 2 PT d2e (2-cf ) for all k  2); { what is the relation of the families Z Y;\dke (X ) with respect to the other families of array languages considered in this paper; { in the one-dimensional case, how can the families of one-dimensional array languages like PT dke (1-cf ), PT dke (1-#-cf ), etc., be characterized? The onedimensional case often yields surprising results, cf. [12, 14].

Acknowledgements. The work of the rst author was supported by Deutsche Forschungsgemeinschaft grant DFG La 618/3-1/2.

References 1. E. Csuhaj-Varju, J. Dassow, J. Kelemen, and Gh. Paun, Grammar Systems (Gordon and Breach, London, 1994). 2. C. R. Cook and P. S.-P. Wang, A Chomsky hierarchy of isotonic array grammars and languages, Computer Graphics and Image Processing 8 (1978), pp. 144{152. 3. J. Dassow and Gh. Paun, Regulated Rewriting in Formal Language Theory (Springer, Berlin, 1989). 4. J. Dassow, R. Freund, and Gh. Paun, Cooperating array grammar systems, International Journal of Pattern Recognition and Arti cial Intelligence 9 (6) (1995), pp. 1029{1053. 5. J. Dassow and R. Freund, A general framework for regulated rewriting, manuscript. 6. H. Fernau and R. Freund, Bounded parallelism in array grammars used for character recognition. In: P. Perner, P. S.-P. Wang, and A. Rosenfeld (eds.), Proceedings SSPR'96, LNCS 1121 (Springer, Berlin, 1996), pp. 40{49. 7. H. Fernau and R. Freund, Accepting array grammars with control mechanisms. In: Gh. Paun and A. Salomaa (eds.), New Trends in Formal Languages, LNCS 1218 (Springer, Berlin, 1997), pp. 95{118. 8. H. Fernau, R. Freund, and M. Holzer, The generative power of d-dimensional #context-free array grammars, manuscript. 9. H. Fernau and M. Holzer, Regulated nite index language families collapse, Technical Report WSI{96{16, Universitat Tubingen (Germany), Wilhelm-SchickardInstitut fur Informatik, 1996. 10. H. Fernau and M. Holzer, Conditional context-free languages of nite index. In: Gh. Paun and A. Salomaa (eds.), New Trends in Formal Languages, LNCS 1218 (Springer, Berlin, 1997), pp. 10{26. 11. R. Freund, Aspects of n-dimensional Lindenmayer systems. In: G. Rozenberg, and A. Salomaa (eds.), Developments in Language Theory (World Scienti c Publ., Singapore, 1994), pp. 250{261. 12. R. Freund, One-dimensional #-sensing context-free array grammars, Technical Report, Universitat Magdeburg, 1994. 13. R. Freund, Array grammar systems with prescribed teams of array productions. In: J. Dassow, Gh. Paun and A. Salomaa (eds.), Developments in Language Theory II (Gordon and Breach, London, 1996), pp. 220{229. 14. R. Freund and Gh. Paun, One-dimensional matrix array grammars, J. Inform. Process. Cybernet. EIK 29 (6) (1993), pp. 1{18.

15. R. Freund, Control mechanisms on #-context-free array grammars. In: Gh. Paun (ed.), Mathematical Aspects of Natural and Formal Languages (World Scienti c Publ., Singapore, 1994), pp. 97{136. 16. R. Freund, Gh. Paun, G. Rozenberg, and A. Salomaa, Bidirectional sticker systems, Proceedings PSB'98, to appear. 17. D. Giammaressi and A. Restivo, Two-dimensional nite state recognizability, Fundamentae Informaticae 25 (3{4) (1996), pp. 399{422. 18. S. Ginsburg and E. H. Spanier, Derivation-bounded languages, Journal of Computer and System Sciences 2 (1968), pp. 228{250. 19. K. Inoue and I. Takanami, A survey of two-dimensional automata theory. In: J. Dassow and J. Kelemen (eds.), Machines, Languages, and Complexity IMYCS'88, LNCS 381 (Springer, Berlin, 1988), pp. 72{91. 20. L. Kari, A. Mateescu, Gh. Paun, and A. Salomaa, Teams in cooperating distributed grammar systems, Journal of Experimental and Theoretical AI 7 (1995), pp. 347{ 359. 21. C. Kim and I. H. Sudborough, The membership and equivalence problems for picture languages, Theoretical Computer Science 52 (1987), pp. 177{191. 22. K. Morita, Y. Yamamoto, and K. Sugata, The complexity of some decision problems about two-dimensional array grammars, Information Sciences 30 (1983), pp. 241{262. 23. V. Rajlich. Absolutely parallel grammars and two-way deterministic nite state transducers, Journal of Computer and System Sciences 6 (1972), pp. 324{342. 24. Gh. Paun and G. Rozenberg, Prescribed teams of grammars, Acta Informatica 31, 6 (1994), 525-537. 25. A. Rosenfeld, Picture Languages (Academic Press, Reading, MA, 1979). 26. G. Rozenberg and D. Vermeir, On the e ect of the nite index restriction on several families of grammars, Information and Control 39 (1978), pp. 284{302. 27. G. Rozenberg and D. Vermeir, On the e ect of the nite index restriction on several families of grammars; Part 2: context dependent systems and grammars, Foundations of Control Engineering 3 (1978), pp. 126{142. 28. A. Salomaa, Formal Languages (Academic Press, Reading, MA, 1973). 29. I. H. Sudborough and E. Welzl, Complexity and decidability for chain code picture languages, Theoretical Computer Science 36 (1985), pp. 173{202. 30. P. S.-P. Wang, Some new results on isotonic array grammars, Information Processing Letters 10 (1980), pp. 129{131. 31. Y. Yamamoto, K. Morita, and K. Sugata, Context-sensitivity of two-dimensional regular array grammars. In: P. S.-P. Wang (ed.), Array Grammars, Patterns and Recognizers, WSP Series in Computer Science, Vol. 18 (World Scienti c Publ., Singapore, 1989), pp. 17{41.

REGULATED ARRAY GRAMMARS OF FINITE INDEX Part II: Syntactic pattern recognition Henning FERNAU Markus HOLZER

Wilhelm-Schickard-Institut fur Informatik, Universitat Tubingen, Sand 13, D-72076 Tubingen, Germany

[email protected] [email protected]

Rudolf FREUND Institut fur Computersprachen, Technische Universitat Wien, Resselg. 3, A-1040 Wien, Austria [email protected]

Abstract. We introduce special k-head nite array automata, which

characterize the array languages generated by speci c variants of regulated n-dimensional context-free array grammars of nite index we introduced in the rst part of this paper. As a practical application we show how these analyzing devices in the two-dimensional case can be used in the eld of syntactic character recognition.

1 Bridging the gap between theory and practice The rst part of this paper laid some theoretical foundations concerning regulated array grammars of nite index (with various regulation mechanisms) and cooperating/distributed array grammar systems with prescribed teams. (We will not repeat the de nitions of these mechanisms in this second part.) We concentrated on questions concerning the equivalence of di erent de nitions regarding their descriptive capacity. In this second part of our paper, we will focus on practical, especially algorithmic aspects of the problem of recognizing handwritten characters by means of certain array grammars, roughly describing a recognition system which is currently under development; prototype versions running on personal computers can be obtained by contacting the second author. One of the advantages of such syntactic methods for character recognition is the fact that they describe certain typical features of the characters instead of comparing characters bit by bit with a reference character. This may lead to recognition algorithms which are less font-sensitive, see [8, page 274]. On the other hand, recently the authors have considered bounded parallelism within array grammars [I-7] on practical grounds. This feature can nicely be formulated in terms of cooperating/distributed array grammar systems with prescribed teams as introduced in part I of our paper. In practical applications of these theoretical models as for character recognition the number of active areas processed in parallel is quite limited. Hence, our theoretical models of regulated array grammars of nite index are tting quite well for being applied in the eld of syntactic character recognition.

Part II of our current paper is organized as follows: In the next section we address the important stages on the process of syntactic character recognition. In the third section we discuss some problems arising when going from theory to reality, e.g., when going to implement a tool based on the theoretical model of n-dimensional array grammars with prescribed teams of nite index. In the fourth section we elaborate how array grammars with prescribed teams can be interpreted as analyzing mechanisms, which then leads us to the de nition of n-dimensional k-head nite array automata. The implementation of a prototype for the syntactic analysis of hand-written (upper-case) characters based on twodimensional k-head nite array automata is discussed in the fth section. A short summary concludes the paper. The bibliography is just a supplement of the one given in the end of part I of the paper.

2 Aspects of syntactic character recognition In this section we give a short overview of the stages in syntactic character recognition and of the data used in our practical experiments.

2.1 Data acquisition Hand-written characters were acquired from hundreds of persons on speci c forms and then scanned in order to obtain a digital pixel image. A reference to the database obtained in that way can be found in [3].

2.2 Preprocessing Before we can use the scanned characters for a syntactic analysis, some preprocessing steps are necessary in order to obtain suitable data. We should like to mention that the two crucial steps of the preprocessing procedure, i.e., noise elimination and thinning to unitary skeletons, can be carried out within the theoretical framework of parallel array grammars, which was already exhibited in some details in [I-11].

Normalisation and noise elimination The scanned characters rst are normalized to ll out a 320  400 grid in order to get comparable patterns. Then noisy pixels are eliminated. After noise elimination, the resulting arrays on the 320  400 grid are mapped on a 20  25 grid. Thinning The arrays on the 20  25 grid now are subjected to a thinning

algorithm which nally yields unitary skeletons of the digitized characters. In the literature a lot of such thinning algorithms can be found, which reduce the thickness of the lines constituting a character to one, e.g., see [15].

2.3 Syntactic analysis The unitary skeleton of a character obtained after the thinning procedure (the last step of preprocessing) now is the input for an o -line tool analyzing this pattern according to a speci c syntactic model. In our approach, regulated array grammars of nite index as discussed in part I of our paper or k-head nite array automata as introduced later in this paper build up the syntactic model. t t t t t t t t t t t t t t t t t t t t t t

tttttttttttttttt

Fig. 1.

t t t t t t t t t t t t t t t t t t t t t t

An ideal character \H".

For example, the cluster of ideal letters \H" of arbitrary size (an instance is shown in Figure 1) can be described by the following two-dimensional array grammar with prescribed teams of index 4: G = (n; fS; L; R; DL; UL ; DR ; UR g ; fag ; #; (P; R; ;) ; f((0; 0) ; S )g) ;  P = S # ! L R ; # L ! L a; R # ! a R; # #

L!

UL DL

a; R

# #

!a

UR # ; U ! Ua L ; # ! Ua R ; U L R DR

DL ! a ; DR ! a ; U ! a; U ! a; D ! a; D ! a R L R # DL # DR L



t t

t t t t

Fig. 2.

R=



*

t t

t t t t t

t

t t

t

t t t

ttt

d

d

t

t t

ttttt

t t t

t

t t t t t t t

tt t t t

t

t t t

t

On the border between \H" and \A".





S # ! L R ; # L ! L a; R # ! a R ;

#

L!

UL

a; R

#

!a

UR +

;

# DL # DR   # ! UL ; # ! UR ; DL ! a ; DR ! a ; DR DL # UL a UR a #

hUL ! a; UR ! a; DL ! a; DR ! aig : A typical derivation in G is the following one:

S =)G L R =)G L a a R =)G UL a a DL

aaaa

UR a a DR

=)G

UL a a a a DL

UL DL

aaaa

aaaa

UR a a a a DR

UR DR

=)G

a a a

a a a

a a a

a a a

=)G a a a a

The main problems occurring in realistic patterns are the deviations of the lines forming a character and possible gaps in these lines (see Figure 2). For example, in order to cover deviations of the horizontal line, together with the array production R# ! aR we also have to consider the array productions # R R a R ! a and # ! R . One of the most important features of an ecient tool is to use suitable error measures which allow us to obtain reasonable clusters for the di erent letters in the alphabet. In fact, sometimes the border line between two di erent letters is quite \ uent". For example, the array represented by the lled circles will still be recognized as an \H" by a lot of people, whereas when adding the two pixels indicated by the non- lled circles nearly all people will agree in recognizing this array as an \A", because the upper endings of the vertical lines now are close enough to each other; yet the question remains how to determine exact values for the distance of these endings as well as for the deviations of the vertical lines in order to separate the cluster of arrays representing the symbol \A" from the cluster of arrays representing the symbol \H". The main features of a given character that may contribute to an error measure are the deviations from the lines building up an ideal letter and the remaining pixels not covered by the syntactic analysis. Yet also more elaborated features as the distances of end points of lines (compare the discussion above concerning the letters \A" and \H") may increase the error and thus help to distinguish between two clusters representing di erent letters.

3 Theory and reality As shown in [I-8], the embedding of any one-dimensional recursively enumerable array language in the two-dimensional space can be generated by a twodimensional #-context-free array grammar; the proof even shows that already for strictly context-free two-dimensional array grammars the xed and the general membership problems become undecidable. Even for regular array languages and grammars, these problems are NP-complete as stated in [I-22]. Yet these \limit features" need not have such a deadly importance on such a restricted domain as a 20  25 grid we use in our implementations of a syntactical character recognition model based on regulated array grammars. In reality, a more powerful theoretical model may yield a much faster and therefore much more ecient tool for character recognition as long as the increase in the theoretical complexity reduces the parsing complexity, especially by reducing the number of non-deterministic choices of rules during the parsing procedure. Hence, already in [4] the theoretical mechanism of graph controlled (analyzing) array grammars was chosen as the theoretical basis of the character recognition tool proposed there. In fact, controlling the dynamic program ow by graphs is very useful; therefore, a tool based on graph controlled array grammars (as, e.g., described in [4]) is much more ecient than a tool based on array grammars without regulation (as proposed in [16]). Moreover, as characters can

be seen as being composed of a very few lines only, a small number of active areas analyzing these lines, which often even have some interdependency relations like equal lengths, is another promising approach we already proposed in [I-6]. Hence, characters of even arbitrary size can be characterized by languages in PT dke (2-cf ) for rather small k; an example for an array grammar with prescribed teams of index four representing the cluster of symbols \H" of arbitrary size was already exhibited in the preceding section. From part I of our paper we know that PT d1e (2-cf ) = L (2-reg), which family of array languages already has a hard enough membership problem as stated above, but in fact analyzing array grammars with prescribed teams of context-free array productions of nite index as proposed in [I-6] allow even deterministic parsing of speci c characters as we will elaborate in the next section.

4

k-head

nite array automata

In the string case, multihead nite automata belong to the oldest subjects of study, see [11]. As regards multi-dimensional automata, we refer the reader to [1, 6] and [I-19]. Our aim is to de ne array automata in such a way that they characterize families of array language de ned by regulated (strictly) contextfree array grammars of nite index introduced in part I of the paper. On an intuitive level, context-freeness in the string case means that a characterizing automaton model has to be essentially a one-way model. This restricts the movements of the input heads such that each head can read the same information only once, since it scans the input word from left to right. Array grammars do not process only symbol information (as in the string case) but also position information (and hence direction information), so that we cannot hope for a purely one-way automaton analogue. Instead, in the (strictly) context-free case (but not in the #-context-free case!) we have the restriction that a position which is looked up once in the derivation process will eventually carry some terminal symbol. From this, we can deduce that a reasonable automaton model should obey the restriction that the same information is read only once. This formulation resembles the characterization of the one-way-property in the string case very much, but there is an important di erence: while in the string case, k heads may scan the same symbol k times (each of them can read the same information once), in the array case for our purposes we require that the whole automaton may scan a certain position only once. (Let us mention that two-dimensional automata which cannot visit one point twice are also called \worms" in [2].) In passing, this excludes a head sensing ability: it is not possible for two heads to assume the same position at the same time. Moreover, in our model we include the possibility that an automaton head may split in a certain, local way, and that it may be totally removed if it is not necessary any more. Observe that this \recycling feature" of reading heads ts very well in the idea that the k heads are essentially k pointer within the array (especially, from the point of view of implementation of the formalism), but we do not allow arbitrary pointer calculations. Instead, we stick to local head movements.

Alternatively, there is the possibility to de ne really parallel Turing machines or nite automata working on multi-dimensional input tapes. Here, we refer the reader to [5, 18], but we do not use this approach here. Instead of elaborating new sophisticated de nitions, we use our knowledge from the rst part of our paper in order to de ne a suitable description of k-head nite array automata. In the model of array grammars with prescribed teams working with the nite index restriction we have already incorporated the idea of a limited number of active positions in an underlying array. In [I-6], the idea of analyzing array grammars with prescribed teams has already been discussed in such a way that a derivation step with a selected team is only possible if the following conditions hold: 1. By applying the team all the non-terminal symbols appearing in the current array are derived in parallel. 2. The shape of the current array after this derivation step is part of the shape of the originally given array and, moreover, at each position where we already nd a terminal symbol in this array, this terminal symbol must coincide with the corresponding symbol at this position in the originally given array. Condition 1 is just the nite index restriction introduced in part I of the paper. Condition 2 in this form is only reasonable for the case of (strictly) context-free array productions; if we also allow #-context-free array productions we have to use the following weaker condition: 2'. At each position where we already nd a terminal symbol in the underlying array, this terminal symbol must coincide with the corresponding symbol at this position in the originally given array. This weaker condition means that the non-terminal symbols of the current array may also occupy positions that are only occupied by the blank symbol in the originally given array. Taking over this idea of an analyzing array grammar as described above, we can give a formal de nition of a k-head nite automaton in the following way: An n-dimensional k-head nite array automaton of type X; X 2 fn-#-cf; n-cf; n-scf; n-cf1 ; n-scf1 j n  1g is a construct (n; VN ; VT ; #; (P; R; F ) ; fv0 ; S g); which is an n-dimensional array grammar with prescribed teams G of nite index k; such that 1. each team in R contains at most k array productions of type X ; 2. G is in the normal form established in Lemma 5 of part I; 3. for any array production p 2 P; p is of the form p = (W; f( n ; A)g [ f(v; #) j v 2 W n f n gg f(v; Xv )g j v 2 W g) with fv j Xv 2 VT g  f v g :

The work of the automaton M on a given array from VTn is de ned as follows: M works on objects from f(a; a) ; (a; X ) j a 2 VT ; X 2 VN [ f#ggn ; where the rst component contains the given array and the second component contains the array generated so far by G. The current state of the automaton M is represented by the set of nonterminal symbols Y occurring in the pairs (a; Y ) of the current array. The derivation relation for M , =)M ; on the second component corresponds with the derivation relation =)G; with the additional restriction that every terminal symbol generated in the second component must be equal to the terminal symbol in the rst component. A parsing derivation of M is called accepting, if nally all non-blank positions are terminal, i.e., occupied by symbols of the form (a; a) ; a 2 VT : The array language accepted by M therefore is de ned by L (M ) = fA j A 2 VTn ; f(v; (A (v) ; #)) j v 2 shape (A)g =)M f(v; (A (v) ; A (v))) j v 2 shape (A)gg Observe that due to our de nitions, a head of the automaton, which is represented by one of the at most k (di erent) variables appearing in the current array, reads out the symbol in the rst component just when leaving this position with putting there the exactly same terminal symbol into the second component. As the terminal symbol at a speci c position is already uniquely determined by the rst position we could also put only a speci c marker symbol into the second components just to mark these positions as non-reachable by any head of the automaton any more. As it is quite obvious from the de nitions given above, the families of array languages accepted by k-head automata of type X with or without ac exactly coincide with the corresponding families of array languages PTacdke (X ) and PT dke (X ), respectively, where X 2 fn-#-cf; n-cf; n-scf; n-cf1 ; n-scf1 j n  1g and k  1: Hence all the theoretical results obtained in part I for the generating devices considered there directly carry over to the parsing mechanism of k-head nite automata as de ned above. In the string case, similar devices were considered in [9]. One might argue that in this construction of an n-dimensional k-head nite automaton given above the basic concept of states, which usually constitutes an important feature of an automaton model, only appears in a weak variant, i.e., as the set of all subsets V of VN with card (V )  k. Yet as we can derive from the theoretical results proved in part I, using a graph control structure will not increase the power of the model; hence it seems reasonable to keep the formal de nitions on the chosen level and to discuss possible extensions in an informal way only. For example, as exhibited in [4], using a graph control structure is a powerful means for reducing the non-determinism in tools for syntactic character recognition based on array grammar models. Therefore, it is reasonable to add this control mechanism to the model of k-head nite automata when implementing this theoretical approach in a tool as described in [3], yet we shall not go into formal details here. Furthermore, observe that the heads occurring in our model can be viewed as agents (workers) which are sent to their working places in order to perform

their work (a derivation step); so, the idea of multi-agent systems (which was one of the basic motivations for introducing cooperating/distributed grammar systems, cf. [I-1]) emerges quite naturally in our automaton setting. As illustration of our de nition, we review the example for the array grammar with prescribed teams of index four described in section 2 and show the analysis of the pattern whose generation was given there: (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #)

(a; #) (a; S ) (a; #) (a; #)

(a; #) (a; L) (a; R) (a; #)

(a; L) (a; a) (a; a) (a; R)

(a; #) (a; #) (a; UL ) (a; DL ) (a; #) (a; #) (a; #) (a; UL ) (a; a) (a; a) (a; DL ) (a; #)

(a; a) (a; a) (a; a) (a; a)

(a; a) (a; a) (a; a) (a; a)

(a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #) (a; #)

(a; #) (a; #) (a; #) (a; #) (a; #) (a; #)

=)M

=)M

=)M

(a; #) (a; #) (a; UR ) (a; DR ) (a; #) (a; #) (a; #) (a; UR ) (a; a) (a; a) (a; DR ) (a; #)

=)M

=)M

(a; UL ) (a; a) (a; a) (a; a) (a; a) (a; DL ) (a; a) (a; a) (a; a) (a; a) (a; a) (a; a)

(a; a) (a; a) (a; a) (a; a)

(a; a) (a; a) (a; a) (a; a)

(a; UR ) (a; a) (a; a) (a; a) (a; a) (a; DR )

=)M

(a; a) (a; a) (a; a) (a; a) (a; a) (a; a)

An important feature of this parsing sequence depicted above is the determinism of the given derivation, i.e., for each array in L (M ) there is exactly one parsing derivation. For arrays not in L (M ), the crucial moment is the change from the horizontal line to the vertical lines. Yet in this special case of M; the possibility of a non-deterministic choice in the underlying pattern immediately implies that this pattern cannot belong to L(M ). Yet for practical implementations where we also want to recognize non-ideal patterns in a decent way, this is one of the most important problems we have to deal with. Let us remark that according to Lemma I-7 one-head nite array automata characterize regular array languages, confer to [I-2]. Rosenfeld in [12] compares several alternative de nitions of nite-state picture languages and shows that they do not characterize regular array languages. Moreover, it is known that non-deterministic nite one-head automata (which are allowed to read the same input more than once and can move in four directions, i.e., they are the natural two-dimensional equivalent to classical two-way automata) recognize a language class which is strictly contained in the so-called recognizable picture languages, which in turn can be seen as the generalization of algebraic characterizations of regular string languages, cf. [I-17] and especially [7, Cor. 3.2]. A characterization of strictly context-free array languages via an automaton model with a rather tedious de tion was given in [10]. The interrelation with so-called pushdown automata on arrays [14] seems to be open. We do not want to conceal one theoretical drawback of our automaton model somewhat hidden in the acceptance condition: an array pattern must be parsed completely in order to get accepted. This is a condition which comes from outside the model. Nakamura managed to include such a test within his automaton model for context-free array languages in [10]. From a practical point of view, this drawback is not so important, since unvisited points may be found quite eciently in a post-processing phase. Moreover, super uous pixels not covered by the syntactic analysis may occur anyway when dealing with \real" characters. Finally, for type n-#-cf such a test is not possible at all.

5 A prototype implementation In this section we describe some interesting observations made during the prototype implementation [3] of the model of k-head nite array automata for syntactic character recognition. In fact, the tool also incorporates a graph control structure in order to reduce the non-determinism arising from non-ideal patterns due to deviations of lines and gaps witin the lines. The ac mode in the graph control structure also allows us to consume the pixels along a line exhaustively, which often is of advantage because remaining pixels increase the error. The type of the k-head nite automata, i.e., the type of the array productions, is chosen as 2-cf . Due to possible gaps in realistic characters, we cannot restrict ourselves to norm 1; i.e., to 2-cf1, yet on the other hand we can avoid to have to use rules of type 2-#-cf: From a theoretical point of view, rules of type 2-scf might be sucient (for ideal patterns, even rules of type 2-scf1 are sucient, e.g., see the array grammar with prescribed teams of index four for the set of arrays representing the symbol \H"). Yet as already mentioned in the previous section, situations like at crossing points of lines in realistic patterns cause possible non-deterministic choices how to proceed. In order to make such decisions easier (and more deterministic, i.e., in this way reducing the need for back-tracking), we allow a larger neighbourhood for looking ahead, which also includes the possibility to check some of these positions for not yet having been reached by other heads of the automaton. In order to obtain suitable criteria for look-ahead neighbourhood patterns and other features introduced for improving the eciency of the tool, even some heuristic investigations were carried out to optimize the eciency and the recognition rate of the tool. Observe that it would be quite easy to incorporate such look-ahead features formally in the automaton model introduced in the preceding section. Obviously, the automaton model would then resemble very much the LL parsers well-known from string language compilers, since basically a top-down parse through the grammar is done by the automaton. In the string case, contextfree graph-controlled LL parsers (where it is required that always the left-most symbol is rewritten) are known to characterize the deterministic context-free languages [13], which might be characterized via bottom-up parsers without regulations alternatively. So, this quite practically motivated class of recognizers gives rise to the interesting theoretical question what kind of arrays are recognized by such devices with look-ahead.

6 Conclusions The theoretical models of regulated array grammars of nite index have turned out to constitute suitable mechanisms for syntactic pattern recognition, e.g., for the recognition of hand-written upper-case characters. Combinations of these mechanisms with other approaches as neural networks should allow the development of an even more ecient tool with very high recognition rate.

Finally, we should like to mention that the use of (regulated) array grammars (with nite index) is not restricted to the recognition of characters; these mechanisms may even be used to characterize three-dimensional objects (see [17]). Hence, in the eld of syntactic pattern recognition a lot of applications to be considered in the theoretical framework presented in the current paper remain for future research projects. Acknowledgements. The work of the rst author was supported by Deutsche Forschungsgemeinschaft grant DFG La 618/3-2..

References 1. M. Blum and C. Hewitt, Automata on a two-dimensional tape. In: IEEE Symposium on Switching and Automata Theory (1967), pp. 155{160. 2. M. Blum and W. J. Sakoda, On the capability of nite automata in 2 and 3 dimensional space. In: Proc. 18th Ann. Symp. on Foundations in Computer Science (1977), pp. 147{161. 3. W. Dittmann, Diploma thesis, Technische Universitat Wien, 1997. 4. R. Freund, Syntactic recognition of handwritten characters by programmed array grammars with attribute vectors, Seventh International Conference on Image Analysis and Processing, Bari, Italy, in: Progress in Image Analysis and Processing III (ed. S. Impedovo, World Scienti c Publ., Singapore, 1993), pp. 357 { 364. 5. A. Hemmerling, Systeme von Turing-Automaten auf rahmbaren Pseudomustermengen. Elektronische Informationsverarbeitung und Kybernetik EIK 15 (1979), pp. 47{72. 6. O. Ibarra and R. T. Melson, Some results concerning automata on two-dimensional tapes, International Journal of Computer Mathematics, Series A 4 (1974), pp. 269{279. 7. K. Inoue and I. Takanami, A characterization of recognizable picture languages. In: A. Nakamura, M. Nivat, A. Saoudi, P. S.-P. Wang, and K. Inoue (eds.), Parallel Image Analysis ICPIA'92, LNCS 654 (Springer, Berlin, 1992), pp. 133{143. 8. S. Kahan, T. Pavlidis, and H. S. Baird, On the recognition of printed characters of any font and size, IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (1987), pp. 274{288. 9. V. Mihalache, Accepting cooperating distributed grammar systems with terminal derivation, Bulletin of the EATCS 61 (1997), pp. 80{84. 10. A. Nakamura, Parallel  -erasing array acceptors, Computer Graphics and Image Processing 14 (1980), pp. 80{86. 11. A. L. Rosenberg, On multihead nite automata, IBM J. Res. Develop. 10 (1966), pp. 388{394. 12. A. Rosenfeld, Some notes on nite-state picture languages, Information and Control 31 (1976), pp. 177{184. 13. A. Rumann, Dynamic LL(k) parsing, Acta Informatica 34 (1997), pp. 267{289. 14. A. N. Shah, Pushdown automata on arrays, Information Sciences 25 (1981), pp. 175{193. 15. J. H. Sossa, An improved parallel algorithm for thinning digital patterns, Pattern Recognition Letters 10 (1989), pp. 77-80. 16. P. S.-P. Wang, An application of array grammars to clustering analysis for syntactic patterns, Pattern Recognition 17, 4 (1984), pp. 441 { 451.

17. P. S.-P. Wang, Three-dimensional array grammars and object recognition, Proceedings CAIP'91, Research in Informatics 5 (Akademie-Verlag, 1991), pp. 276-280. 18. Th. Worsch, On parallel Turing machines with multi-head control units, Technical report, Universitat Karlsruhe, Fakultat fur Informatik, 1996, number 11/96.