To appear in Journal of Logic, Language and Information, Kluwer, Winter 1998
Group Theory and Computational Linguistics Marc Dymetman Xerox Research Centre Europe 6 chemin de Maupertuis 38240 Meylan, France
[email protected]
August 1998 Abstract
There is currently much interest in bringing together the tradition of categorial grammar, and especially the Lambek calculus, with the recent paradigm of linear logic to which it has strong ties. One active research area is designing non-commutative versions of linear logic (Abrusci 1995, Retore 1993) which can be sensitive to word order while retaining the hypothetical reasoning capabilities of standard (commutative) linear logic (Dalrymple et al. 1994). Some connections between the Lambek calculus and computations in groups have long been known (Van Benthem 1991) but no serious attempt has been made to base a theory of linguistic processing solely on group structure. This paper presents such a model, and demonstrates the connection between linguistic processing and the classical algebraic notions of non-commutative free group, conjugacy, and group presentations. A grammar in this model, or G-grammar is a collection of lexical expressions which are products of logical forms, phonological forms, and inverses of those. Phrasal descriptions are obtained by forming products of lexical expressions and by cancelling contiguous elements which are inverses of each other. A G-grammar provides a symmetrical speci cation of the relation between a logical form and a phonological string that is neutral between parsing and generation modes. We show how the G-grammar can be \oriented" for each of the modes by reformulating the lexical expressions as rewriting rules adapted to parsing or generation, which then have strong decidability properties (inherent reversibility). We give examples showing the value of conjugacy for handling long-distance movement and quanti er scoping both in parsing and generation. The paper argues that by moving from the free monoid over a vocabulary V (standard in formal language theory) to the free group over V, deep anities between linguistic phenomena and classical algebra come to the surface, and that the consequences of tapping the mathematical connections thus established can be considerable.
1
1 Introduction There is currently much interest in bringing together the tradition of categorial grammar, and especially the Lambek calculus [10], with the more recent paradigm of linear logic [8] to which it has strong ties. One active research area concerns the design of non-commutative versions of linear logic [1, 14] which can be sensitive to word order while retaining the hypothetical reasoning capabilities of standard (commutative) linear logic that make it so well-adapted to handling such phenomena as quanti er scoping [4]. Some connections between the Lambek calculus and group structure have long been known [16], and linear logic itself has some aspects strongly reminiscent of groups (the producer/consumer duality of a formula A with its linear negation A? ), but no serious attempt has been made so far to base a theory of linguistic description solely on group structure. This paper presents such a model, G-grammars (for \group grammars"), and argues that: The standard group-theoretic notion of conjugacy, which is central in Ggrammars, is well-suited to a uniform description of commutative and non-commutative aspects of language. The use of conjugacy provides an elegant approach to long-distance dependency and scoping phenomena, both in parsing and in generation. G-grammars give a symmetrical account of the semantics-phonology relation, from which it is possible to extract, via simple group calculations, rewriting systems with strong decidability properties computing this relation for the parsing and generation modes. The paper is organized as follows. In Section 2 we introduce a \group computation" model, using standard algebraic tools such as free groups, conjugacy and normal subsets. The main deviation from traditional mathematical practice is in the focus given to the notions of compatible preorder and normal submonoid, whereas those of compatible equivalence relation and normal subgroup are more usual in algebra. Section 3 applies this model to linguistic description, and presents a G-grammar for a fragment of English involving quanti cation and relative pronouns. Sections 4 and 5 are concerned with generation and parsing, which correspond to two ways of exploiting the relation of preorder associated with the G-grammar, one (generation) in which logical forms are iteratively rewritten as combinations of logical forms and phonological forms until only phonological forms are left, the other (parsing) in which phonological forms are rewritten as combinations of logical forms and inverses of those until, after cancellation of adjacent inverses, exactly one logical form is left. Section 6 introduces the concept of diagrams, which provide an intuitive and powerful geometrical representation for G-grammars. These diagrams originate in the work of Van Kampen [17], and have found applications in combinatorial group theory and in studies of decidable subclasses of the word problem for groups [11]. 2
They have also been used to produce complete term rewriting speci cations for certain classes of groups [2]. Section 7 discusses in detail the conditions under which G-grammars and the associated rewriting systems lead to equivalent de nitions of the semantics-phonology relation. These results are used for comparing G-grammars, context-free grammars, DCG's and categorial grammars. The section also provides a short discussion of the advantages of G-grammars for describing in a uniform way commutative and non-commutative aspects of language. Section 8 applies certain group morphisms to show some strong computability properties of G-grammars, both for parsing and for generation.
2 Group Computation
A monoid M is a set M together with a product M M ! M, written (a; b) 7! ab, such that: This product is associative; There is an element 1 2 M (the neutral element) with 1a = a1 = a for all a 2 M. A group is a monoid in which every element a has an inverse a?1 such that ? 1 a a = aa?1 = 1. A preorder on a set is a re exive and transitive relation on this set. When the relation is also symmetrical, that is, R(x; y) ) R(y; x), then the preorder is called an equivalence relation. When it is antisymmetrical, that is that is, R(x; y) ^ R(y; x) ) x = y, it is called a partial order. A preorder R on a group G will be said to be compatible with the group product i, whenever R(x; y) and R(x0 ; y0 ), then R(xx0; yy0 ).
2.1 Normal submonoids of a group.
We consider a compatible preorder notated x ! y on a group G. The following properties, for any x; y 2 G, are immediate: x ! y , xy?1 ! 1; x ! y , y?1 ! x?1; x ! 1 , 1 ! x?1 ; x ! 1 ) yxy?1 ! 1; for any y 2 G: Two elements x; x0 in a group G are said to be conjugate if there exists y 2 G such that x0 = yxy?1 . The fourth property above says that the set M of elements x 2 G such that x ! 1 is a set which contains along with an element all its conjugates, that is, a normal subset of G. As M is clearly a submonoid of G, it will be called a normal submonoid of G. Conversely, it is easy to show that with any normal submonoid M of G one can associate a preorder compatible with G. Indeed let's de ne x ! y as xy?1 2 3
M. The relation ! is clearly re exive and transitive, hence is a preorder. It is also compatible with G, for if x1 ! y1 and x2 ! y2 , then x1y1 ?1 , x2y2 ?1 and y1 (x2 y2 ?1)y1 ?1 are in M; hence x1x2y2 ?1 y1 ?1 = x1y1 ?1 y1 x2y2 ?1y1 ?1 is in M, implying that x1x2 ! y1 y2 , that is, that the preorder is compatible. Remark. In general M is not a subgroup of G. It is i x ! y implies y ! x, that is, if the compatible preorder ! is an equivalence relation (and, therefore, a congruence) on G. When this is the case, M is a normal subgroup of G. This notion plays a pivotal role in classical algebra. Its generalization to submonoids of G is basic for the algebraic theory of computation presented here. If S is a subset of G, the intersection of all normal submonoids of G containing S (resp. of all subgroups of G containing S) is a normal submonoid of G (resp. a normal subgroup of G) and is called the normal submonoid closure NM(S) of S in G (resp. the normal subgroup closure NG(S) of S in G).
2.2 The free group over V.
We now consider an arbitrary set V , called the vocabulary, and we form the so-called set of atoms on V , which is notated V [ V ?1 and is obtained by taking both elements v in V and the formal inverses v?1 of these elements. We now consider the set F(V ) consisting of the empty string, notated 1, and of strings of the form x1x2 :::xn, where xi is an atom on V . It is assumed that such a string is reduced, that is, never contains two consecutive atoms which are inverse of each other: no substring vv?1 or v?1 v is allowed to appear in a reduced string. When and are two reduced strings, their concatenation can be reduced by eliminating all substrings of the form vv?1 or v?1 v. It can be proven that the reduced string obtained in this way is independent of the order of such eliminations. In this way, a product on F(V ) is de ned, and it is easily shown that F(V ) becomes a (non-commutative) group, called the free group over V [15].
2.3 Group computation
We will say that an ordered pair GCS = (V; R) is a group computation structure if: 1. V is a set, called the vocabulary, or the set of generators 2. R is a subset of F(V ), called the lexicon, or the set of relators.1 1 For readers familiar with group theory, this terminology will evoke the classical notion of group presentation through generators and relators. The main dierence with our de nition is that, in the classical case, the set of relators is taken to be symmetrical, that is, to contain r?1 if it contains r. When this additional assumption is made, our preorder becomes an equivalence relation.
4
The submonoid closure NM(R) of R in F(V ) is called the result monoid of the group computation structure GCS. The elements of NM(R) will be called computation results, or simply results. If r is a relator, and if is an arbitrary element of F(V ), then r?1 will be called a quasi-relator of the group computation structure. It is easily seen
that the set RN of quasi-relators is equal to the normal subset closure of R in F(V ), and that NM(RN ) is equal to NM(R). A computation relative to GCS is a nite sequence c = (r1 ; : : :; rn) of quasi-relators. The product r1 rn in F(V ) is evidently a result, and is called the result of the computation c. It can be shown that the result monoid is entirely covered in this way: each result is the result of some computation. A computation can thus be seen as a \witness", or as a \proof", of the fact that a given element of F(V ) is a result of the computation structure.2 For speci c computation tasks, one focusses on results of a certain sort, for instance results which express a relationship of input-output, where input and output are assumed to belong to certain object types. For example, in computational linguistics, one is often interested in results which express a relationship between a xed semantic input and a possible textual output (generation mode) or conversely in results which express a relationship between a xed textual input and a possible semantic output (parsing mode). If GCS = (V; R) is a group computation structure, and if A is a given subset of F(V ), then we will call the pair GCSA = (GCS; A) a group computation structure with acceptors. We will say that A is the set of acceptors, or the public interface, of GCSA. A result of GCS which belongs to the public interface will be called a public result of GCSA.
3 G-Grammars We will now show how the formal concepts introduced above can be applied to the problems of grammatical description and computation. We start by introducing a grammar, which we will call a G-Grammar (for \Group Grammar"), for a fragment of English (see Fig. 1). A G-grammar is a group computation structure with acceptors over a vocabulary V = Vlog [ Vphon consisting of a set of logical forms Vlog and a disjoint set of phonological elements (in the example, words) Vphon . Examples of phonological elements are john, saw, every, examples of logical forms j, s(j,l), ev(m,x,sm(w,y,s(x,y))); these logical forms can be glossed respectively as \john", \john saw louise" and \for every man x, for some woman y, x saw y". The grammar lexicon, or set of relators, R is given as a list of \lexical schemes". An example is given in Fig. 1. Each line is a lexical scheme and represents a set of relators in F(V ). The rst line is a ground scheme, which corresponds to the single relator j john?1 , and so are the next four lines. The sixth line is a non-ground scheme, which corresponds to an in nite set of relators, 2 The analogy with the view in constructive logics is clear. There what we call a result is called a formula or a type, and what we call a computation is called a proof.
5
j john?1 l louise?1 p paris?1 m man?1 w woman?1 A?1 r(A) ran?1 A?1 s(A,B) B?1 saw?1 E?1 i(E,A) A?1 in?1 t(N) N?1 the?1 ev(N,X,P[X]) P[X]?1 ?1 X N?1 sm(N,X,P[X]) P[X]?1 ?1 X N?1 ? 1 N tt(N,X,P[X]) P[X]?1 ?1 X
every?1 some?1 that?1
Figure 1: A G-grammar for a fragment of English obtained by instanciating the term meta-variable A (notated in uppercase) to a logical form. So are the remaining lines. We use Greek letters for expression meta-variables such as , which can be replaced by an arbitrary expression of F(V ); thus, whereas the term meta-variables A, B, ..., range over logical forms, the expression meta-variables , , ..., range over products of logical forms and phonological elements (or their inverses) in F(V ).3 The notation P[x] is employed to express the fact that a logical form containing an argument identi er x is equal to the application of the abstraction P to x. The identi er meta-variable X in P[X] ranges over such identi ers (x, y, z, ...), which are notated in lower-case italics (and are always ground). The meta-variable P ranges over logical form abstractions missing one argument (for instance z.s(j,z)). When matching meta-variables in logical forms, we will allow limited use of higher-order uni cation. For instance, one can match P[X] to s(j,x) by taking P = z :s(j; z ) and X = x . The vocabulary and the set of relators that we have just speci ed de ne a group computation structure GCS = (V; R). We will now describe a set of acceptors A for this computation structure. We take A to be the set of elements of F(V ) which are products of the following form: S Wn ?1Wn?1 ?1 : : :W1?1 where S is a logical form (S stands for \semantics"), and where each Wi is a phonological element (W stands for \word"). The expression above is a way of encoding the ordered pair consisting of the logical form S and the phonological string W1 W2 : : :Wn (that is, the inverse of the product Wn ?1Wn?1 ?1 : : :W1 ?1 ). A public result SWn ?1Wn?1?1 : : :W1 ?1 in the group computation structure 3 Expression meta-variables are employed in the grammar for forming the set of conjugates exp ?1 of certain expressions exp (in our example, exp is ev(N,X,P[X]) P[X]?1, sm(N,X,P[X]) P[X]?1 , or X). Conjugacy allows the enclosed material exp to move as a block in expressions of F (V ), see sections 4 and 5.
6
* * * * *
j john l louise p paris m man w woman r(A) A ran s(A,B) A saw B i(E,A) E in A t(N) the N ?1 every N X?1 ev(N,X,P[X]) ?1 some N X?1 sm(N,X,P[X]) tt(N,X,P[X]) N that ?1 X?1
*
*
* *
* * *
P[X] P[X] P[X]
Figure 2: Generation-oriented rules with acceptors ((V; R); A) | the G-grammar |will be interpreted as meaning that the logical form S can be expressed as the phonological string W1 W2 : : :Wn . Let us give an example of a public result relative to the grammar of Fig. 1. We consider the relators (instanciations of relator schemes): r1 = j?1 s(j,l) l?1 r2 = l louise?1 r3 = j john?1
saw?1
and the quasi-relators:
r1 ' = j r1 j?1 r2 ' = (j saw) r2 (j r3 ' = r 3
saw)?1
Then we have: r1 ' r 2 ' r 3 ' = j j?1 s(j,l) l?1 saw?1 j?1 j saw l louise?1 saw?1 j?1 j john?1 = s(j,l) louise?1
saw?1 john?1 louise?1 saw?1john?1 is the result of a computation
which means that . This result is obviously a public one, which means that the logical form can be verbalized as the phonological string john saw louise.
s(j,l) (r1 ',r2',r3') s(j,l)
4 Generation Applying directly, as we have just done, the de nition of a group computation structure in order to obtain public results can be somewhat unintuitive. It is often easier to use the preorder ! . If, for a; b; c 2 F(V ), abc is a relator, 7
then abc ! 1, and therefore b ! a?1 c?1. Taking this remark into account, it is possible to write the relators of our G-grammar as the \rewriting rules" of Fig. 2; we use the notation * instead of ! to distinguish these rules from the parsing rules which will be introduced in the next section. The rules of Fig. 2 have a systematic structure. The left-hand side of each rule consists of a single logical form, taken from the corresponding relator in the G-grammar; the right-hand side is obtained by \moving" all the remaining elements in the relator to the right of the arrow. Because the rules of Fig. 2 privilege the rewriting of a logical form into an expression of F(V ), they are called generation-oriented rules associated with the G-grammar. Using these rules, and the fact that the preorder * is compatible with the product of F(V ), the fact that s(j,l) louise?1 saw?1john?1 is a public result can be obtained in a simpler way than previously. We have: s(j,l) * j saw l j * john l * louise by the seventh, rst and second rules (properly instanciated), and therefore, by transitivity and compatibility of the preorder: s(j,l) * j saw l * john saw l * john saw louise which proves that s(j,l) * john saw louise, which is equivalent to saying that s(j,l) louise?1saw?1 john?1 is a public result. Some other generation examples are given in Fig. 3. The rst example is straightforward and works similarly to the one we have just seen: from the logical form i(s(j,l),p) one can derive the phonological string john saw louise in paris.
4.1 Long-distance movement and quanti ers
The second and third examples are parallel to each other and show the derivation of the same string every man saw some woman from two dierent logical forms. The penultimate and last steps of each example are the most interesting. In the penultimate step of the second example, is instanciated to saw?1 x?1 . This has the eect of \moving" as a whole the expression some woman y?1 to the position just before y, and therefore to allow for the cancellation of y?1 and y. The net eect is thus to \replace" the identi er y by the string some woman; in the last step is instanciated to the neutral element 1, which has the eect of replacing x by every man. In the penultimate step of the third example, is instanciated to the neutral element, which has the eect of replacing x by every man; then is instanciated to saw?1man?1 every?1 , which has the eect of replacing y by some woman. Remark. In all cases in which an expression similar to a1 : : : am ?1 appears (with the ai arbitrary vocabulary elements), it is easily seen that, by giving 8
i(s(j,l),p) s(j,l) in p j saw l in p john saw l in p
* * * * john saw louise in p * john saw louise in paris
ev(m,x,sm(w,y,s(x,y))) ?1 every m x?1 sm(w,y,s(x,y)) ?1 every m x?1 ?1 some w y?1
* * *
s(x,y) ?1 every man x?1 ?1 some woman y?1 x saw y ? 1 * every man x?1 x saw some woman (by taking = saw?1 x?1 ) * every man saw some woman (by taking = 1)
sm(w,y,ev(m,x,s(x,y))) ?1 some w y?1 ev(m,x,s(x,y))) ?1 some w y?1 ?1 every m x?1
* * *
s(x,y) ?1 some woman y?1 ?1 every man x?1 x saw y ? 1 * some woman y?1 every man saw y (by taking = 1) * every man saw some woman (by taking = saw?1 man?1 every?1 )
r(t(tt(m,x,s(l,x)))) t(tt(m,x,s(l,x))) ran the tt(m,x,s(l,x)) ran the m that ?1 x?1 s(l,x) ran the man that ?1 x?1 s(l,x) ran the man that ?1 x?1 l saw x ran
* * * * * * the man that ?1 x?1 louise saw x ran * the man that louise saw ran (by taking = saw?1 louise?1 ) Figure 3: Generation examples
9
an appropriate value in F(V ), the a1 : : : am can move arbitrarily to the left or to the right, but only together in solidarity; they can also freely permute cyclically, that is, by giving an appropriate value to , the expression a1 : : : am ?1 can take on the value ak ak+1 : : : am a1 : : : ak?1 (other permutations are in general not possible). The values given to the , , etc., in the examples of this paper can be understood intuitively in terms of these two properties. We see that, by this mechanism of concerted movement, quanti ed noun phrases can move to whatever place is assigned to them after the expansion of their \scope" predicate, a place which was unpredictable at the time of the expansion of the quanti ed logical form. The identi ers act as \target markers" for the quanti ed noun phrase: the only way to \get rid" of an identi er x is by moving x ?1 , and therefore with it the corresponding quanti ed noun phrase, to a place where it can cancel with x . The fourth example exploits a similar mechanism for handling relative clauses. At the time the relative pronoun is produced, an identi er inverse x ?1 is also produced which has the capability of moving to whatever position is nally assigned to the relative verb's argument x .
4.2 Word movement and group morphisms
The derivations of Fig. 3 show possible rewritings of the four given logical forms into phonological strings. It is natural to ask whether these rewritings are the only ones possible. That is in fact the case, but for now we will con ne ourselves to showing that the expression:
= ?1 every man x?1 ?1some woman y?1 x saw y can only be rewritten into the phonological string: every man saw some woman ; whatever values one may choose for the expression meta-variables and .4 We start by considering a variant of the original vocabulary V , namely the vocabulary V 0 =def (V nfxg) [fa; bg, where a; b are new letters not appearing in V . We then consider the application : F(V ) ! F(V 0) which maps a reduced expression g of F(V ) into the expression of F(V 0) obtained by replacing each x (resp. x?1 ) in g by a?1every man (resp. man ?1 every ?1 a), and similarly by replacing each y (resp. y?1 ) in g by b?1some woman (resp. woman ?1 some ?1 b). The application is clearly a group morphism from F(V ) to F(V 0); in fact it is an isomorphism, where ?1 maps a (resp. b) into every man x?1 (resp. some woman y?1 ). We then consider the vocabulary V 00 =def V 0 n fa; bg, and the application : F(V 0) ! F(V 00) obtained by mapping, in reduced expressions of F(V 0 ),
4 Of course, for certain values of ; , the rst expressiondoes not rewrite into a phonological string at all.
10
F(V )
F(V 0)
F(V 00) Figure 4: Group morphisms between free groups over dierent vocabularies. The group F(V ) is isomorphic to F(V 0 ), which in turns projects onto F(V 00). The \phonological" subgroup F(Vphon ) is kept invariant in the three morphisms. each a (resp. a?1, b, b?1) into 1. This is again clearly a group morphism, and we can consider the composition of morphisms = : F(V ) ! F(V 00) (see Fig. 4). This morphism has the following properties: It maps all phonological strings into themselves. It maps x (resp. y) into every man (resp. some woman). It maps
= ?1every man x?1 ?1 some woman y?1 xsaw y into
(?1 a ?1 b a?1 every man saw b?1some woman ); that is, into every man saw some woman : Because phonological strings are kept invariant in the mapping , whenever one gives values to ; such that is a phonological string, then, for these values, ( ) = , but we have just seen that, for any values of ; one has ( ) = every man saw some woman . This establishes the initial claim. This simple proof illustrates clearly the power of group operations. By changing the presentation of the group F(V ) into an isomorphic one in which x; y have been reexpressed in terms of a; b and of phonological elements, one obtains effects which can be interpreted as constituent movements which furthermore are mandatory if the output is constrained to be a string containing only phonological elements.
5 Parsing
To the compatible preorder ! on F(V ) there corresponds a \reverse" compatible preorder + de ned as a +b i b ! a, or, equivalently, a?1 ! b?1. The normal submonoid M 0 in F(V ) associated with + is the inverse monoid of the normal submonoid M associated with ! , that is, M 0 contains a i M contains a?1 . It is then clear that one can present the relations: 11
j john?1 1 A?1 r(A) ran?1 1 sm(N,X,P[X]) P[X]?1
!
!
?1X N?1some?1 ! 1 etc. in the equivalent way: john j?1 + 1 ran r(A)?1A + 1 some N X?1 P[X] sm(N,X,P[X])?1?1 + 1 etc. john + j louise + l paris + p man + m woman + w ran + A?1 r(A) saw + A?1 s(A,B) B?1 in + E?1 i(E,A) A?1 the + t(N) N?1 every + ev(N,X,P[X]) P[X]?1 ?1 X N?1 some + sm(N,X,P[X]) P[X]?1 ?1 X N?1 that + N?1 tt(N,X,P[X]) P[X]?1 ?1 X
Figure 5: Parsing-oriented rules Suppose now that we move to the right of the + arrow all elements appearing on the left of it, but for the single phonological element of each relator. We obtain the rules of Fig. 5, which we call the \parsing-oriented" rules associated with the G-grammar. By the same reasoning as in the generation case, it is easy to show that any derivation using these rules and leading to the relation PS + LF, where PS is a phonological string and LF a logical form, corresponds to a public result LF PS ?1 in the G-grammar. A few parsing examples are given in Fig. 6; they are the converses of the generation examples given earlier. In the rst example, we rst rewrite each of the phonological elements into the expression appearing on the right-hand side of the rules (and where the metavariables have been renamed in the standard way to avoid name clashes). The rewriting has taken place in parallel, which is of course permitted (we could have obtained the same result by rewriting the words one by one). We then perform certain uni cations: A is uni ed with j, C with p; then B is uni ed to l.5 Finally 5 Another possibility at this point would be to unify l with E rather than with B. This would lead to the construction of the logical form i(l,p), and, after uni cation of E with
12
john saw louise in paris
+ + + +
j A?1 s(A,B) B?1 l E?1 i(E,C) C?1 p s(j,B) B?1 l E?1 i(E,p) s(j,l) E?1 i(E,p) i(s(j,l),p)
every man saw some woman + ev(N,x,P[x]) P[x]?1 ?1 x N?1 m A?1 s(A,B) B?1 sm(M,y,Q[y]) Q[y]?1 ?1 y M?1 w + ev(m,x,P[x]) P[x]?1 ?1 x A?1 s(A,B) B?1 sm(w,y,Q[y]) Q[y]?1 ?1 y + x A?1 ev(m,x,P[x]) P[x]?1 s(A,B) B?1 sm(w,y,Q[y]) Q[y]?1 ?1 y + x A?1 ev(m,x,P[x]) P[x]?1 s(A,B) Q[y]?1 sm(w,y,Q[y]) B?1 y + ev(m,x,P[x]) P[x]?1 s(x,y) Q[y]?1 sm(w,y,Q[y])
and then either: + ev(m,x,P[x]) P[x]?1 sm(w,y,s(x,y)) + ev(m,x,sm(w,y,s(x,y))) or: + ev(m,x,s(x,y)) Q[y]?1 sm(w,y,Q[y]) + sm(w,y,ev(m,x,s(x,y))
the man that louise saw ran + t(N) N?1 m M?1 tt(M,x,P[x]) P[x]?1 ?1 x l A?1 s(A,B) B?1 C?1 r(C) + t(N) N?1 tt(m,x,P[x]) P[x]?1 ?1 x
+ + + + +
C?1
s(l,B) B?1
r(C) t(N) N?1 tt(m,x,P[x]) P[x]?1 s(l,B) x B?1 C?1 r(C) t(N) N?1 tt(m,x,P[x]) P[x]?1 s(l,x) C?1 r(C) t(N) N?1 tt(m,x,s(l,x)) C?1 r(C) t(tt(m,x,s(l,x))) C?1 r(C) r(t(tt(m,x,s(l,x))))
Figure 6: Parsing examples
13
is uni ed with s(j,l), and we obtain the logical form i(s(j,l),p). In this last step, it might seem feasible to unify E to i(E,p) instead, but that is in fact forbidden for it would mean that the logical form i(E,p) is not a nite tree, as we do require. This condition prevents \self-cancellation" of a logical form with a logical form that it strictly contains. E
5.1 Quanti er scoping
In the second example, we start by unifying m with N and w with M; then we \move" P[x]?1 next to s(A,B) by taking = x A?1;6 then again we \move" Q[y]?1 next to s(A,B) by taking = B sm(w,y,Q[y])?1; x is then uni ed with A and y with B. This leads to the expression: ev(m,x,P[x])P[x]?1s(x,y)Q[y]?1sm(w,y,Q[y])
where we now have a choice. We can either unify s(x,y) with Q[y], or with P[x]. In the rst case, we continue by now unifying P[x] with sm(w,y,s(x,y)), leading to the output ev(m,x,sm(w,y,s(x,y))). In the second case, we continue by now unifying Q[y] with ev(m,x,s(x,y)), leading to the output sm(w,y,ev(m,x,s(x,y)). The two possible quanti er scopings for the input string are thus obtained, each corresponding to a certain order of performing the uni cations. In the last example, the most interesting step is the one (third step) in which is instanciated to s(l,B)?1, which has the eect of \moving" x close to the \missing" argument B?1 of \louise saw", to cancel it by uni cation with B and consequently to ll the second argument position in the logical form headed by s. After this step, P[x] is ready to be uni ed with s(l,x), nally leading to the expected logical form output for the sentence.
6 Diagrams 6.1 De nition
De nition. A diagram over the vocabulary V is a nite graph which is (1)
planar, that is, embedded in the plane in such a way that the edges do not cross; (2) connected; (3) directed, that is, the edges carry an orientation; (4) labelled, that is, each edge carries a label taken in V . A diagram separates the plane in n+1 connected open sets: the exterior (set of points that can be connected to a point at in nity without crossing an edge), that logical form, would conduct to the output s(j,i(l,p)). If one wants to prevent this output, several approaches are possible. The rst one consists in typing the logical form with syntactic categories. The second one is to have some notion of logical-form well-formedness (or perhaps interpretability) disallowing the logical forms i(l,p) [louise in paris] or i(t(w),p) [(the woman) in paris], although it might allow the form t(i(w,p)) [the (woman in paris)]. 6 We have assumed that the meta-variables corresponding to identi ers in P and Q have been instanciated to arbitrary, but dierent, values x and y. We leave a discussion of this point to a future paper.
14
and n open internal regions each consisting of points which can be connected without crossing an edge, but which are separated from the exterior. An internal region will be called a cell. An example of a diagram over the vocabulary fa; b; cg is given in Fig. 7. c a
a c
c
c
11 00 00 11 00 11 c 00 11 00 11 00 11 00 11 00 11 00 11 O
b a a
Figure 7: A diagram. The boundary of a cell is the set of edges which constitute its topological boundary. The boundary of a diagram is the set of edges which are such that all their points are connected to the exterior. If one choses an arbitrary vertex (such as O in the gure) on the boundary of a diagram, and if one moves on the boundary in a conventional clockwise fashion (an orientation that we adopt throughout in the sequel), then one collects a list of edges which are either directed in the same way as the movement, or contrary to it. By producing a sequence of atoms, positive in the rst case, negative in the second one, one can then construct a word of F(V ); this word is said to be a boundary word of the diagram.
6.2 Reduced diagrams
We will say that a diagram is reduced if there does not exist a pair of edges with a common vertex O, with the same label, oriented oppositely relative to O (that is, both edges point towards O or both point from O), and such that at least one of the two \angles" formed by the two edges is \free", that is, does not \contain" another diagram edge (see Fig. 8).
6.3 Cyclically reduced words
De nition. A word w on V [ V ? is said to be cyclically reduced i every 1
cyclic permutation of it is reduced. It is easy to see that (1) a reduced word is cyclically reduced i it is not of the form aw0 a?1 with a an atom (positive or negative), (2) the conjugate class of any word contains cyclically reduced words, which are cyclic permutations of each other. 15
c
c
a
b
11 00 00 11 00 11 00 11 c 00 11 00 11 00 11 00 11 00 11 O
a a
Figure 8: A reduced diagram. The diagram of the previous gure was not reduced because of the two c edges outgoing from vertex O.
6.4 Relator cells
Consider a group computation structure GCS = (V; R). Without loss of generality, it can be assumed that the relators in R are cyclically reduced, because the result monoid is invariant when one considers a new set of relators consisting of conjugates of the original ones. From now on, unless especially stated otherwise, this assumption will be made for all relators considered. Take any such cyclically reduced relator r = xe11 : : :xen , where xi 2 V and ei = 1, and construct a labelled cell in the following way: take a circle and divide it in n arcs; label the clockwise-ith arc xi and orient it clockwise if ei = 1, anti-clockwise otherwise. The labelled cell thus obtained is call the relator cell associated with r. Rather than presenting the GCS through a set of relator words as we have done before, it is now possible to present it through a set of relator cells; if one gives such a set, a standard presentation of the GCS can be derived by taking an arbitrary origin on each cell and \reading" the relator word clockwise from this origin; the origin chosen does not matter: any other origin leads to a conjugate relator, and this does not aect the notion of result. n
6.5 Fundamental theorem of combinatorial group theory
We are now able to state what J. Rotman calls \the fundamental theorem of combinatorial group theory" [15]. We give the theorem in a slightly extended form, adapted to the case of the normal sub-monoid closure; the standard case of normal subgroup closure follows immediately by taking a set or relators containing r?1 along with r. Theorem 1 Let GCS = (V; R) be a group computation structure such that all relators r 2 R are cyclically reduced. If w is a cyclically reduced word in F(V ), then w 2 NM(R) if and only if there exists a reduced diagram having boundary word w and whose regions are relator cells associated with the elements of R. 16
r2
r1
u2 u1
un
rn
O
Figure 9: Star diagram. The proof is not provided; it can easily be recovered from the property demonstrated in [11] (chapter 5, section 1). The proof involves the following remark. If one considers a product u1r1u1 ?1 : : :unrnun ?1 with r?1 2 R and ui arbitrary elements of F(V ), this product can be read as the boundary word of the \star" diagram represented in Fig. 9, starting at O and progressing clockwise. This star diagram is in general not in reduced form, but it can be reduced by a stepwise process of equating edges which do not respect the de nition of a reduced diagram. Example. Let's consider a GCS with vocabulary V = fa; b; cg and set of (cyclically reduced) relators R = fc?1c?1 a?1c?1 ; acb?1; baag . The cyclically reduced word c?1 aac?1 is an element of NM(R), for it can be obtained by forming the product c?1c?1 a?1c?1 cacb?1c?1 cbaac?1: If we form the star diagram for this product, we obtain the rst diagram shown in Fig. 10. This diagram is not reduced, for instance the two straight edges labelled c are oending the reduction condition. If one \stitches" these two edges together, one obtains the second diagram in the gure. This stitching corresponds to a one-step reduction of the boundary word of the rst diagram, c?1 c?1a?1 c?1 cacb?1 c?1 cbaac?1 into the boundary word of the second c?1 c?1a?1 c?1 cacb?1 baac?1. By continuing in this way, one obtains the fth diagram of the gure, which is reduced, and whose boundary is the wanted result c?1 aac?1.
6.6 Linguistic examples
The previous considerations can be applied to the linguistic examples given in the body of the article. Here we only consider examples which do not involve 17
c
b
a
00 11 00 11 00 11 111111 000000 c 00 11 000000 111111 00 11 000000 c111111 00 11 c 000000 111111 00 11 000000 111111 00 11 000000 111111 O
a
c
c
b a a
c
b b a
a c a a c
a
a c
O
c
c
11 00 00 11 00 11 c 00 11 00 11 00 11 00 11 00 11
c
c
b a
11 00 00 11 00 11 c 00 11 00 11 00 11 00 11 00 11 00 11 O
a
c c a
11 00 00 11 00 11 00 11 00 11 c 00 11 00 11 00 11 00 11 O
b c
a
a
b
a
c
c
11 00 00 11 00 11 00 11 c 00 11 00 11 00 11 00 11 00 11 O
a
a a
Figure 10: Transformation of a diagram into reduced form (adapted from [11]).
18
long-distance dependencies; the more complex examples will be treated after we have introduced multi-relators in 7.6. j
l
p
(1)
(2)
(3)
john
louise
paris
s(j,l)
i(s(j,l),p)
(5)
(4) l
j
p
s(j,l) in
saw
Figure 11: Cells associated with some grammar relators. i(s(j,l),p)
s(j,l)
j
p
l
paris
john saw
in louise
Figure 12: A diagram establishing the relationship between a logical form and a phonological string. Let's consider the relator schemes, from Fig. 1:
j john?1 l louise?1 p paris?1 A?1 s(A,B) B?1 E?1 i(E,A) A?1
saw?1 in?1
19
In Fig. 11, cells (which we have numbered from 1 to 5) associated with some instanciations of these schemes are shown. In Fig. 12 we construct a reduced diagram whose cells are the relator cells of Fig. 11. The boundary word of this diagram is the expression i(s(j,l),p) paris?1in?1 louise?1saw?1 john?1 . This proves that, in the G-grammar, the logical form i(s(j,l),p) is associated with the phonological string john saw louise in paris. Note (1) the analogy between reading the diagram from top to bottom (resp. bottom to top) and a generation (resp. parsing) process.7
7 G-grammars and rewriting In the discussion of parsing and generation, we saw how a derivation according to the rewriting rules of Figs. 2 and 5 is always \sound" with respect to the group computation structure. We did not consider the opposite question, namely whether it is \complete" with respect to it: can any public result relative to the GCS be obtained by such rewritings? The answer to this question will be given by theorems demonstrated below (theorems 4 and 7), which roughly state that such a rewriting system is complete relative to the GCS if the system does not contain \ground cycles", that is situations where a ground term T can derive : : :T : : :. This condition is true of both the rewriting systems of Figs 2 and 5. For instance, in the generation case, it can be checked that any ground logical form that appears on the right-hand side of a rule of Fig. 2 is strictly smaller than the ground logical form on the left-hand side, therefore precluding ground cycles. This condition is related to the decidability of generation and parsing, and we will prove in 8 that our G-grammar is nitely enumerable both in parsing and in generation, that is, in the terminology of [6], that it is inherently reversible. This property is dicult to guarantee in formalisms relying on empty categories for long-distance dependencies, a problem which is avoided by the use group structure for the same purpose.
7.1 Oriented relators and rewriting rules
Suppose r = a1 : : :an is a relator. This relator is said to be oriented at index i if a number i with 1 i n has been chosen. A QCF rule (QCF stands for \quasi-context-free") on F(V ) is a pair, notated a 7! , with a 2 V [ V ?1; 2 (V [ V ?1 ) . If GCS = (V; R) is a group computation structure, and if r is a relator in R such that r = a1 : : :an in F(V ), then the rule ai 7! ai?1 ?1 : : :a1 ?1 an?1 : : :ai+1 ?1 is said to be the rule associated with the relator r oriented at index i.
7 There is also an analogy between a bottom-up reading and a chart parsing approach in the case of this example. In more complex examples (for instance multi-word expression parsing) a better analogy would be with Colmerauer's Q-systems [3]. However, both analogies break down when considering examples involving long-distance dependencies.
20
If for each element in a set of relators R, an orientation is chosen, one says that an orientation has been given to the GCS. By associating a QCF rule to each relator, one obtains the QCF system associated with the orientation. A derivation relative to a QCF system is a nite sequence s1 7! s2 7! : : : 7! sn of elements si 2 (V [ V ?1 ) (that is, of strings formed over elements of V or their inverses) such that, for each i, there exists ; 2 (V [ V ?1 ) with either: 1. si a , si+1 b , and a 7! b is a rule of the rewriting system [replacement step], 2. or si xx?1 , si+1 , with x 2 (V [ V ?1 ) [reduction step]. If for two strings s and t in (V [ V ?1) one can nd a derivation s1 7! s2 7! : : : 7! sn with s s1 and t sn , or if s and t are identical, then we will say that s derives into t and we will write s 7! t (using the same notation as for rewrite rules; the context will make clear which case is intended). The following proposition, stating that rewriting implies preorder, is straightforward.
Theorem 2 If 7! is the derivation relation associated with an oriented GCS, then, for any s; t 2 (V [ V ?1 ) one has: s 7! t ) s ! t:
Proof. Immediate consequence of the fact that if r = a1 : : :an is a relator, then
a1 : : :ai?1ai ai+1 : : :an ! 1 and therefore
ai ! ai?1?1 : : :a1?1 an?1 : : :ai+1 ?1 :
ut
7.2 Anteriority
Let's consider the QCF system associated with an oriented GCS. If a and b are elements of V [ V ?1 , and if there exists a rule in the system of the form: a 7! b ; then we will say that a is immediately anterior to b relative to the system (or, equivalently, relative to the oriented GCS). If there exists a nite sequence a = c1; : : :; cn = b such that ci is immediately anterior to ci+1 then we will say that a is anterior to b, which will be noted: a b: 21
We will say that the anteriority relation is acyclic i it is never the case that a a. We will say that it is noetherian i any \descending" chain a1 a2 : : : is nite. If anteriority is noetherian it is a fortiori acyclic. If a rule in the QCF system has left-hand side a, then a will be called the mother of the rule (or, equivalently, of the oriented relator corresponding to the rule). The set M of all rule mothers is a subset of the set of atoms V [ V ?1 , called the set of potential mothers associated with the QCF system. We will note M ?1 the set of atoms which are inverses of potential mothers.
7.3 Rewriting theorem (basic case)
We are now ready to state a theorem which provides conditions under which the reciprocal of theorem 2 holds. We will only sketch proofs. We rst need a lemma.
Lemma 3 Suppose that is a diagram relative to an oriented acyclic GCS, and also that the set of potential mothers does not simultaneously contain an element and its inverse. Then there is a relator cell in this diagram such that its mother m is on the boundary of the diagram. Proof. Take any cell ? in the diagram, and consider the mother m in this cell. If
m is on the boundary of the diagram, we are done. Otherwise, m is on an edge separating ? from another cell ?0, and therefore m?1 is on the boundary of ?0 (as a consequence of the clockwise orientations of the cells). Because the condition about potential mothers, m?1 cannot be the mother of ?0, and therefore the mother m0 of ?0 is anterior to m. If m0 is on the diagram boundary, we are done, otherwise we repeat the operation. Because the anteriority relation is acyclic, the iteration must stop at some point, which proves the lemma. ut We can now state the theorem.
Theorem 4 If ` 7! ' is the derivation relation associated with an oriented GCS where (i) anteriority is acyclic, and (2) the set of potential mothers does not contain simultaneously an element and its inverse, then, for any s 2 (V [ V ? ) , and for any t 2 ((V [ V ? ) n M ? ) , one has: s ! t ) s 7! t: 1
1
1
Proof. Suppose that s ! t, then there exists a diagram having k cells whose boundary is st?1 . We will prove the result by induction on k. In the base case k = 0, the result is immediate, because in a diagram with 0 cells of boundary st?1 , one necessarily has s = t. Let's now address the case of k > 0. From the lemma, there is a cell ? in this diagram whose mother m lies on the boundary of . Because of the condition on t, this m cannot lie in the t portion of the boundary, and so it must lie in the s portion (see Fig. 13).
22
The boundary of ? can be written in the form mw?1 for some word w 2 (V [ V ?1) , and therefore s can be expressed in the form s = s1 ms2 , with s1 and s2 subparts of s (see Fig. 14). But now one has: m 7! w by de nition of the rewriting relation. Furthermore the word s1 ws2 t?1 is on the boundary of a diagram in k ? 1 cells, obtained from by removing the cell ? (see gure); we will not explain here in detail what it means to \remove a cell", and will not justify formally why the remaining construct is still a diagram, but the way the operation works may be understood on the basis of an example (see Fig. 15). We now have: s1 ws2 ! t and, by the induction hypothesis, one has s1 ws2 7! t; and therefore s = s1 ms2 7! s1 ws2 7! t:
ut
s
B
m
C
Γ
A
D
t
Figure 13: The diagram in k cells, whose boundary is st?1 . The word s can be read clockwise from vertex A to vertex D, the word t counter-clockwise from A to D. There exists a cell ? whose mother m lies on the s portion of the boundary.
7.4 G-grammars compared to CFG's, DCG's and Categorial Grammars Rather than starting from a G-grammar and associating rewriting systems to it, one can start from a conventional grammar represented as a rewriting system 23
B
s1
m Γ
w
A
C
s2
D
t
Figure 14: Removing the cell ? by rewriting m into w. and associate to it a G-grammar. An obvious question is then whether the input-output relationship as de ned in the two models agree. Let's for instance consider the context-free grammar presented in Fig. 16, (a). The rewriting system (a) is a QCF system associated with an orientation of the G-grammar given in (a'), over the vocabulary V = U [ W, where U is the set of nonterminals fs; np; vpg and W is the set of terminals john ; walked ; often . By Theorem 2, if we can derive a string of words w1 : : :wn from s, then s ! w1 : : :wn relative to the G-grammar. However, the system (a) is not acyclic for anteriority, because of its fourth rule. The translation of this rule is the relator vp vp?1often ?1 which is equal to often ?1. This means that often ?1 ! 1, or equivalently 1 ! often . This in turn, because of the compatibility of the preorder, implies that, whenever one has s ! , then one has s ! often , for any ; 2 F(V ). In particular one has s ! often john slept , a string that cannot be derived from grammar (a). More generally, it can be shown that, relative to the GCS, the word `often' can be added freely in any string generated from s! One way of making the grammar (a) acyclic is to upgrade it to a De nite Clause Grammar [13] such as (b). With each nonterminal is associated a term representing the syntactic structure of the string spanned by this nonterminal. The crucial point here is that the structure of the mother nonterminal (for instance vp(.(often,VP))) strictly contains the structures of the daughter nonterminals (for instance vp(VP)). It is then easy to show that the system is acyclic. Then Theorem 4 holds, and the notions of derivations relative to (b) and (b') coincide. In the example, the relator vp(:(often; VP))vp(VP)?1 often ?1 does not collapse to often ?1 anymore, for vp(:(often; VP)) ? 1 vp(VP) can never cancel out. From theorem 4, we now see that (b) and (b') are equivalent. Another way to obtain an acyclic grammar is shown in (c). Here we have an in nite set of rules indexed by integers corresponding to string length. The length of a mother nonterminal is strictly larger than the length of any daughter, 24
i(s(j,l),p)
s(j,l)
j
p
l
paris
john saw
in louise
s(j,l)
j
p
l
paris
john saw
in louise
Figure 15: A cell-removal step. If one assumes that s(j,l) in p 7! john saw louise in paris, then one has i(s(j,l),p) 7! s(j,l) in p 7! john saw louise in paris. and this again ensures acyclicity.
7.4.1 G-grammars and categorial grammars
The context-free grammar (a) and its associated G-grammar (a') can be used to illustrate a crucial dierence between G-grammars and categorial grammars. Whereas, in categorial grammar, the expression vp=vp \expects" a vp on the right | and nothing else | and then \returns" a vp, when working with groups, an expression such as vp vp?1 is formally undistinguishable from 1. It is this crucial property that permits G-grammars to pro t from many standard mathematical tools.
25
(a)
(b)
(c)
s 7! np vp np 7! john vp 7! walked vp 7! often vp
(a') s vp?1 np?1 np john?1 vp walked?1 vp vp?1 often?1 = often?1
s(.(NP,VP)) 7! np(NP) vp(VP) np(john) 7! john vp(walked) 7! walked vp(.(often,VP)) 7! often vp(VP)
(b') s(.(NP,VP)) vp(VP)?1 np(NP)?1 np(john) john?1 vp(walked) walked?1 vp(.(often,VP)) vp(VP)?1 often?1
s(x+y) 7! np(x) vp(y) np(1) 7! john vp(1) 7! walked vp(x+1) 7! often vp(x)
(c') s(x+y) vp(x)?1 np(y)?1 np(1) john?1 vp(1) walked?1 vp(x+1) vp(x)?1 often?1
Figure 16: Context-free grammars, DCG's and G-grammars.
7.5 Mixing commutative and non-commutative phenomena, logic programs
We have already seen examples where commutative and non-commutative aspects are both present in a lexical entry. Thus, the presence of ; ?1 in the entry for `every' in Fig. 2 allows the expression every N X?1 to move as a block to the position where the argument X of the verb is eventually found; in this movement, it is however impossible for every and N to exchange their relative positions, and it can be shown that, for the input logical forms of Fig. 3, only the four phonological strings listed can be obtained. Let us brie y indicate how this commutative/non-commutative partnership could be used for error correction purposes. Suppose that a relator of the form Error?1 Repair ?1 Report is added to the grammar, where Error is some erroneous input (for instance it could be the word `principle' improperly used in a situation where `principal' is needed), Repair is what the input should have been (in our example, the word `principal'), and Report is a report which tells us how the error was corrected. Then, the expression Error?1 Repair can move in block to the spot where the error occurs, replacing the erroneous word by the correction and allowing normal processing to continue. The report can then be used for warning or evaluation purposes. Fully commutative structures and logic programs Suppose that one wants to have a group computation structure which is completely commutative, that is, one in which elements can move freely relative to each other. This property could be stipulated by introducing a notion of \commutative group computation" using the free commutative group FC(V ) rather than F(V ). Another possibility, which illustrates the exibility of the group computation approach, is just to add a relator scheme ?1 ?1 to R, where and are expression meta-variables. This expression is called a commutator of and because when multiplied by it yields . This single relator scheme permits to permute 26
elements in any expression, and has the same eect as using FC(V ). An example where commutative structures are useful is the case of logic programs. It can be shown that if one encodes a clause of a logic program P0 P1 : : :Pn as a relator scheme P0 Pn?1 : : :P1?1 in a commutative structure, and de nes public results to consist of a single ground predicate P, then public results in the group computation structure coincide with consequences of the program.8
7.6 Multi-relators
Theorem 4 already gives interesting equivalence for several types of GCS's, such as the ones associated with acyclic DCGs, or for a G-grammar which does not make use of expression meta-variables, such as the G-grammar obtained by eliminating the entries for every, some and that from the G-grammar of Fig. 1. However, it is not sucient for the long-dependency situations such as these where expression meta-variables do appear; in these cases, the conditions of Theorem 4 are not met, because by giving a well-chosen value to the expression meta-variable, the condition of anteriority can be easily violated. We need a stronger version of the theorem which is able to take into account the situation where expression metavariables appear in pairs ; ?1. In order to state this extended version of Theorem 4, we need to introduce some terminology. Consider a nite list w1 : : :wn of elements of (V [ V ?1 ) . Let MC(w1 ; : : :; wn) be the set:
f1w11 ?1 : : :nwn n?1 j 1 2 F(V ); : : :; n 2 F(V )g: The following lemma is straightforward. Lemma 5 . The set MC(w1; : : :; wn) has the following properties: It is a normal subset of F(V ) (that is, is self-conjugate); It is invariant up to arbitrary permutations of the list w1 : : :wn; It is invariant up to cyclic permutations of any of the words wi ; Let B be the set:
fw1 2 wn2?1 : : :nwnn ?1 j 2 2 F(V ); : : :; n 2 F(V )g; Then MC(w1 ; : : :; wn) is the set of conjugates of elements of B in F(V ). 8 This result, in contrast to the corresponding result concerning DCG's (see below) does not depend on a condition of acyclicity. This dierence is related to the fact that, in the case of logic programs, a succesful sequence of rewritings results in the empty string, whereas in the case of DCGs, it results into a string of words. The acyclicity condition for DCGs ensures that this string of words contains all words that could be produced using the GCS. For logic programs, this property is irrelevant.
27
j john?1 l louise?1 p paris?1 m man?1 w woman?1 r(A) ran?1 A?1 s(A,B) B?1 saw?1 A?1 i(E,A) A?1 in?1 E?1 t(N) N?1 the?1 ev(N,X,P[X]) P[X]?1 ; X N?1 every?1 sm(N,X,P[X]) P[X]?1 ; X N?1 some?1 tt(N,X,P[X]) P[X]?1 that?1 N?1 ; X
Figure 17: Multi-relator version of our example G-grammar. We have made free use of the transformations permitted by Lemma 5. Because of the second of these properties the set MC(w1 ; : : :; wn) is de ned as soon as the multiset (unordered list) hw1; : : :; wni, which we will call a multirelator, is given. We will call MC(w1; : : :; wn) the set of multi-conjugates of the multi-relator hw1; : : :; wni. If we are given a ( nite or in nite) collection MR of multi-relators, we can consider the group computation structure obtained by taking as set R of relators the set [ R= MC(mr); mr 2MR
We will call the GCS thus obtained the group computation structure with multi-relators GCS-MR = (V; MR).
When presenting a group computation structure with multi-relators, we will sometimes use the notation: w1; w2; : : :; wn
for listing the multi-relators. With such a notation, the presentation of our example G-grammar of Fig. 1 becomes the presentation shown in Fig. 17. The presentation of the grammar using multi-relators is more symmetrical than that of Fig. 1. Remark that expression meta-variables are not used anymore.
7.6.1 Multi-relators and diagrams
We will now state a theorem which is an extension of the the fundamental theorem of combinatorial group theory for the case of GCS's with multi-relators. We rst need the notion of multi-cell associated with a multi-relator. 28
We rst remark that we can always assume that a multi-relator
hw1 ; w2; : : :; wni is such that each wi is cyclically reduced, because taking the
cyclically reduced conjugate of wi does not change the notion of multi-conjugate. We will assume this is always the case in the sequel. If hw1 ; w2; : : :; wni is a multi-relator, and if ?i is the cell associated with wi in the manner of 6.4, then we will call the multiset of cells = f?1; : : :; ?ng the multi-cell associated with this multi-relator. Consider a nite multiset of multi-relators and take the multi-set obtained by U forming the multiset union = k k of the multi-cells associated with these multi-relators. A diagram whose cells are exactly (that is, taking account of the cell counts) those of the multiset will be said to be a diagram relative to the GCS with multi-relators under consideration. Theorem 6 . Let GCS-MR = (V; MR) be a group computation structure with multi-relators. If w is a cyclically reduced word in F(V ), then w is in the result monoid of GCS-MR i there exists a reduced diagram relative to GCS-MR having boundary w.
The proof of this theorem is a straightforward extension of the proof of Theorem 1 and is not provided.
7.6.2 Linguistic examples
These results can now be applied to a diagrammatic proof of the fact that, in our example G-grammar, one has (see 4.1): ev(m; x ; sm(w; y ; s(x ; y ))) ! every man saw some woman : In Fig. 18 we show multi-cells corresponding to the G-grammar entries relevant for this example (see Fig. 17), and in Fig. 19 a diagrammatic proof of the example; the multi-cells are lettered (1) to (5) for easy comparison between the gures.
7.6.3 Multi-relators and QCF systems
One can extend the notion of orientation of 7 to multi-relators in the following way. We will say that a multi-relator hw1; w2; : : :; wni is oriented at index i; j if one of the words wi has been chosen and if, writing wi = ai1 : : :aim , with aij 2 V [ V ?1 , an index j with 1 j mi has also been chosen. The QCF rule scheme associated with the oriented multi-relator is then the rewriting scheme: i
aij 7! ai(j ?1)?1 : : :ai1?1 aim ?1 : : :ai(j +1)?1
Y
k6=i
k wk ?1 k ;
where the k are expression metavariables (which can therefore take arbitrary values in F(V )). One can extend the notion of anteriority to oriented multi-relators. We simply de ne the atom aij to be anterior to each of the atoms akl ?1 appearing in the 29
s(x,y) m
w
(1)
(2)
(3) y
x man
woman saw
ev(m,x,sm(w,y,s(x,y)))
x
(4)
(4) m
every sm(w,y,s(x,y))
sm(w,y,s(x,y))
y
(5)
(5) some
w
s(x,y)
Figure 18: Multi-cells for some multi-relators in the grammar. Dotted lines indicate that two cells are part of the same multi-cell: they will appear together in a diagram or not at all. right-hand side of the rewriting rule, that is to say, all inverses of atoms appearing in any of the wk , apart of course from aij ?1 itself, and not considering the atoms appearing in any of the k ; k ?1 (it follows that the anteriority relation is de ned between atoms or inverses of atoms that appear in hw1 ; : : :; wni). The notion of acyclic and noetherian QCF system is de ned just as before on the basis of this anteriority relation. The following theorem is a straightforward extension of Theorem 2.
Theorem 7 If 7! is the derivation relation associated with an oriented multirelator GCS, then, for any s; t 2 (V [ V ? ) one has: s 7! t ) s ! t: 1
30
ev(m,x,sm(w,y,s(x,y)))
woman
(4) w
every
(4)
(2)
sm(w,y,s(x,y)) y
(5)
x
(5) (3)
s(x,y)
some
m
(1)
saw
man
Figure 19: A diagram using multi-cells establishing a relationship between the sentence every man saw some woman and its logical form ev(m,x,sm(w,y,s(x,y))).
7.6.4 Rewriting theorem (multi-relator case)
We now state a theorem which extends Theorem 4 to the case of multi-relators GCS's. The proof is only broadly outlined.
Theorem 8 If ` 7! ' is the derivation relation associated with an oriented multirelator GCS where (i) anteriority is acyclic, and (2) the set of potential mothers does not contain simultaneously an element and its inverse, then, for any s 2 (V [ V ? ) , and for any t 2 ((V [ V ? ) n M ? ) , one has: s ! t ) s 7! t: 1
1
1
Proof. Rough sketch. The proof is similar to that of theorem 4. The dierence is that we must perform multi-cell removal rather than simple cell-removal, and that this may involve more complex \topological surgery". An example may serve to illustrate the general method. Let's consider a GCS de ned by the three multi-relators: hfd?1 cm?1 gi, hdeg?1 h?1 bi, hef; ac?1 b?1i. Suppose that these multi-relators are oriented by distinguishing f in the rst, d in the second, and a in the third. Then the corresponding rewriting system has the rules: f 7! g?1 mc?1 d, d 7! b?1hge?1 and a 7! bcef?1 , where is an expression meta-variable. Anteriority is clearly acyclic. Let us now consider the diagram at the top-left of Fig. 20. This diagram contains three multi-cells corresponding to the three grammar multi-relators, one of which, ?, is itself formed of two
31
cells, and corresponds to the third multi-relator. The mother in each multi-cell is underlined. By reading the boundary of the diagram, we see that a ! hm. We want to show how multi-cell removal can be used to obtain a 7! hm. First, by a reasoning similar to the proof of Lemma 3, we know that some mother must appear on the boundary. In the example, this is a. We want to remove the corresponding multi-cell ? from the diagram without destroying the topological property of being a diagram, and, in particular, without destroying simple-connectivity. In order to do that, we have to nd a \subdiagram" (that, is, a topological constructs which is itself a diagram) 0 of consisting only of the cells appearing in the multi-cell under consideration (simple reasoning shows this to always be possible). In our example this subdiagram is constituted by the two cells labelled ? linked by the edge labelled d. The surgery now consists in \forming a tube" wherever a \thin" edge, that is, an edge which is not on the surface of a subdiagram cell, such as the d edge in the example, and in deleting from the diagram both the subdiagram cells and the \tubes" thus formed. What remains is a well-formed diagram | which may well have thin edges itself | in which the removed tubes leave behind couple of parallel edges carrying the same label (second diagram in our gure, consisting of two cells). This multi-cell removal corresponds to a rewriting of the multi-cell mother into a right-hand side, where expression multi-variables have been given proper values (in our example takes the value c?1d) allowing for the construction of the subdiagram. The operation is repeated until all multi-cells have been deleted, in the same way as in the previous proof. In the example given in illustration, after the rst multi-cell, having its mother a on the diagram boundary, has been removed, one obtains the second diagram; at this point a has been rewritten (using the rewriting rule for a) into a 7! bc c?1defd?1 c = bdefd?1 c. The new diagram contains two multi-cells, each consisting of a single cell. Again, there exists a mother on the diagram boundary, this time f. The corresponding cell is removed, and at this point a has been rewritten (using the rewriting rule for f) into bdeg?1 mc?1 dd?1c = bdeg?1 m. The last cell, with mother d is nally removed, leaving the last diagram (with only edges, but no cell). At this point a has been nally rewritten (using the rewriting rule for d) into bb?1 hge?1 eg?1 m = hm. ut
Application to G-grammars When the G-grammar presented as a multirelator GCS in Fig. 17 is oriented by privileging as a left-hand side the \largest" logical-form atom, its associated QCF system becomes the system of Fig. 2. It can be checked that this \generation" system is acyclic and even noetherian. The theorems 7 and 8 then say that generation by using the rewriting system is equivalent to generation by using the G-grammar. A similar result can be shown for the \parsing-oriented" rewriting system of Fig. 5.
32
B
A
B
A a b
Γ
m
d
h
Γ
e
b
c
d
h
f
c
d
m f
e g
g
B
A
A
B
b
d
h
m
h
m
e g
Figure 20: Multi-cell removal and rewriting.
8 Computability We will not attempt here a systematic treatment of the computability properties of G-grammars, but will limit ourselves to prove certain strong decidability properties of our example G-grammar. Although a general synthesis is not proposed here, the methods show clearly the power of group morphisms to study computability in group computation structures. If we consider a grammar to be the speci cation of a recursively enumerable list L of pairs (sem; phon), then, using the terminology of [6], we will say that generation (resp. parsing) is nitely enumerable if the subset of L consisting of the pairs (sem; phon) with sem known (resp. with phon known) is nite and computable in nite time. If parsing and generation are both nitely enumerable, the grammar is said to be inherently reversible. We will sketch a proof of the fact that the G-grammar of Fig. 17 is inherently reversible.9 9 In [6], several types of computability of generation (resp. parsing) are de ned and it is shown there that, unless certain speci c conditionsare met, whatever the type of computability considered, the computability of parsing and the computability of generation bear in general no relation to each other.
33
8.1 Bounding of diagram complexity in function of the input
Let's rst consider a group morphism f from F(V ) to the additive group of reals (R;+). Thus f is a real-valued function on F(V ) which is such that for a; b 2 F(V ), one has f(ab) = f(a)+f(b). Remark that because f takes its value in a commutative group, f is invariant by permutations (that is, f(ab) = f(ba)). We can therefore de ne the value of f on a multi-relator mr = hr1; : : :; rm i to be its value on any of its multi-conjugates, that is, to be f(r1 ) + : : : + f(rm ). If a result s is obtained by taking the product of multi-conjugates of the multirelators mr1 ; : : :; mrn, then f(s) can be computed simply by taking the sum of all the multi-relator components rij , where mri = hri1 ; : : :; rim i. Let's now suppose that there exists a strictly positive real ! such that, for any multi-relator mr in the group computation structure, one has: i
f(mr) !: If such is the case, we will then say that f is a relator-lower-bounded or rlb morphism. Similarly, we will say that f is a relator-upper-bounded or rub morphism i there exists a strictly positive real ! such that, for any multi-relator mr in the group computation structure, one has: f(mr) !: The following property is immediate. Lemma 9 If f is an rlb morphism from F(V ) to R, and if s is a result in the GCS, then (1) f(s) 0, and (2) if a computation of s involves k multi-relators, then k f(s)=!. Similary, if f is an rub morphism, then, if a computation of s involves k multi-relators, then f(s) k!. If f is rlb, and if s an expression in F(V ) for which the value of f is known, then we see that in order to test whether s is a result, we only need to test diagrams with an a priori bounded number of multi-cells. Now consider the following two real-valued functions on F(V ): For e 2 F(V ), e reduced, f(e) is de ned as the dierence between the number of phonological elements (\words") which appear in e with a negative polarity with the number of such elements which appear with a positive polarity. For instance the value of f on john ?1 s(j; m) in the park is 1 ? 3 = ?2. Suppose that the size of a term T is de ned recursively by the following equations: (1) if T is an argument identi er (such as x, y, ...) then size(T) = 0, (2) else if T is of arity 0, then size(T) = 1, (3) else if T is of arity n, of the form T = F(T1; : : :; Tn), then size(T) = 1 + size(T1 ) + : : :+size(Tn ). If e 2 F(V ), e reduced, de ne g(e) as being the sum of the 34
sizes of the terms of positive polarity minus the sum of the sizes of the terms of negative polarity. For instance, the value of g on the expression s(j; l) l?1 saw ?1 j ?1 is 3 ? 1 ? 1 = 1. It is immediate that f and g are morphisms. It is also a simple observation that, for any multirelator mr of Fig. 17, the value of both f and g on mr is uniformly lower-bounded by ! = 1 (in fact both f and g are uniformly equal to 1 in our example). For instance if e is any expression of the form s(A; B)B ?1 saw ?1A?1 , then f(e) = 1, and g(e) = 1 + size(A) + size(B) ? size(B) ? size(A) = 1. It is also clear that both f and g are uniformly upperbounded. Let's now consider parsing. In this case, we are looking at all results of the form sem phon?1 , with phon known, and therefore with f known (it is just the length of the phonological string). We therefore have only to consider diagrams with fewer than length(phon) multicells and having a boundary of the form sem phon?1 . If we instead consider generation, we are looking at all results of the form sem phon?1 , with sem known, and therefore with g known. We therefore have only to consider diagrams with fewer than size(sem) multicells and having a boundary of the form sem phon?1.
8.2 Restriction of the set of multi-cells to consider
We are now very close to results of nite enumerability for parsing and generation. We know a bound for the total number of cells in any relevant diagram. Still, it could be conceivable that the repertory of relevant cells is not nite. We will now show that this cannot be the case. Let's introduce the notion of subharmonic function on a GCS.10 Let h be a function from V [ V ?1 to R. We will say that h is subharmonic relative to the GCS if, for any diagram relative to the CGS, h can only reach its maximum on the boundary of (therefore not on an \internal" edge in the diagram). Let's de ne h as the function which takes a phonological element, whatever its polarity, to 0, and which takes a logical form T, as well as its inverse T ?1 , to size(T). From the discussion of acyclicity of our G-grammar, and from Lemma 3, we see that h is subharmonic for the G-grammar. This means that any multi-cell appearing in a diagram which is a proof of sem phon?1 has all its semantics (of whatever polarity) bounded by the size of sem. It is easy to check that there are only nitely many multi-cells according to the G-grammar for which the largest semantic element has a bounded size.11 10 The terminology is borrowed from analysis, where subharmonic functions are functions which have a property of reaching their maximum only on the boundary of disks. 11 We are nessing the case of the identi er meta-variable X , whose domain of instanciation has not been described precisely. We will assume here that its domain of instanciation is constituted by a nite, xed, set of identi ers x; y; :::. This can be assumed both for generation and for parsing on the basis of a simple argument. A more satisfying treatment is left to future research.
35
So, if we are in generation mode, and know sem, we have only a xed, nite, repertory of multi-cells to consider for diagrams pretending to the status of proofs of a result of the form sem phon?1 . Because of the fact noted above that these cells can appear conjointly only a nite number of times, we have now proven the nite enumerability of generation. For parsing, we only know phon. But it is immediate from the fact that f is rlb and g rub that sem can be bounded from the fact that phon is known. We conclude that only a nite repertory of cells have to be considered for diagrams pretending to the status of proof of results of the form sem phon?1 . By the same reasoning as previously, this proves the nite enumerability of parsing.
Acknowledgements Thanks to Sylvain Pogodalla, Christian Retore and to the anonymous reviewers for their remarks and comments.
References [1] Abrusci, V.: 1991, `Phase semantics and sequent calculus for pure noncommutative classical linear logic'. Journal of Symbolic Logic 56(4). [2] Chenadec, P. L.: 1995, `A Survey of Symmetrized and Complete Group Presentations'. In: H. Comon and J. Jouannaud (eds.): Term Rewriting, Vol. 909 of LNCS. Springer Verlag, pp. 135{153. [3] Colmerauer, A.: 1970, `Les systemes-Q ou un formalisme pour analyser et synthetiser des phrases sur ordinateur'. Publication interne 43, departement d'informatique, Universite de Montreal, Montreal. [4] Dalrymple, M., J. Lamping, F. Pereira, and V. Saraswat: 1995, `Linear Logic for Meaning Assembly'. In: Proc. CLNLP. Edinburgh. [5] Dymetman, M.: 1992, `Transformations de grammaires logiques et reversibilite en Traduction Automatique'. These d'E tat. Universite Joseph Fourier (Grenoble 1), Grenoble, France. [6] Dymetman, M.: 1994, `Inherently Reversible Grammars'. In: T. Strzalkowski (ed.): Reversible Grammar in Natural Language Processing. Dordrecht, Holland: Kluwer Academic Publishers. [7] Dymetman, M.: 1998, `Group Theory and Linguistic Processing'. In: Proceedings of the Coling-ACL conference. Montreal. [8] Girard, J.: 1987, `Linear Logic'. Theoretical Computer Science 50(1). [9] Johnson, D.: 1997, Presentations of Groups. Cambridge University Press. 36
[10] Lambek, J.: 1958, `The mathematics of sentence structure'. American Mathematical Monthly 65, 154{168. [11] Lyndon, R. and P. Schupp: 1977, Combinatorial Group Theory. SpringerVerlag. [12] Pentus, M.: 1993, Lambek grammars are context free. In: Proceedings of eigth annual IEEE symposium on logic in computer science, LICS '93. [13] Pereira, F. C. N. and D. H. D. Warren: 1980, `De nite Clause Grammars for Language Analysis'. Arti cial Intelligence 13, 231{278. [14] Retore, C.: 1993, `Reseaux et sequents ordonnes'. Ph.D. thesis, Univ. Paris 7. [15] Rotman, J. J.: 1994, An Introduction to the Theory of Groups. SpringerVerlag, fourth edition. [16] van Benthem, J.: 1986, Essays in Logical Semantics. Dordrecht, Holland: D. Reidel. [17] van Kampen, E.: 1933, `On some lemmas in the theory of groups'. American Journal of Mathematics 55, 268{73.
37