In Aspects of the Theory of Syntax, Chomsky presents a theory of trans- ... in Mathematical Linguistics, sponsored by the Center for Advanced Studies in the Be- havioral Sciences ..... We shall call u the Zeft exterior of cp[written El(q)] and r the.
INFORMATION
SCIENCES 6,49-83
49
(1973)
On the Generative Power of Transformational Grammars* P. STANLEY PETERS, lR.t University of Texas, Austin, Texas AND R. W. RITCHIE* University of Washington, Seattle, Washington Communicated
by Frank B. Cannonito
ABSTRACT Mathematical modeling of phrase structure grammars has yielded many results of benefit to linguists in their investigation of these grammars, such as Chomsky’s characterization in terms of self-embedding of those context-free languages which are not regular. The recent shift of focus in linguistic theory to transformational grammars has not been accompanied by a similar application of mathematical techniques to transformations. Our present purpose is to foster such studies by providing general definitions which model grammatical transformations as mappings on trees (equivalently, labeled bracketings) and investigating questions of current linguistic interest, such as the recursiveness of languages generated by transformational grammars. The first result of our research is that, despite the linguistically motivated, complex restrictions placed on transformational grammars, every recursively enumerable set of strings is a transformational language (Theorem 5.1). We demonstrate that this power of transformational grammars to generate non-recursive languages results from their ability to cycle their rules, applying transformations an unbounded number of times (Corollary 6.6). Analysis of decision procedures for grammars with bounded cycling reveals a connection between the amount of cycling permitted by a grammar and the complexity of the recursive set it generates; if cycling is bounded by any elementary recursive function (primitive recursive function, function in &? for n > 3), then the language generated has characteristic function in the same class (Corollary 6.7). One application of these results provides empirical support for the notion that natural languages are recursively, in fact elementarily, decidable. Our results also isolate one feature which must be further restricted in a linguistically motivated way if transformational theory is to achieve its goal of delimiting precisely the natural languages.
INTRODUCTION In Aspects of the Theory of Syntax, Chomsky presents a theory of transformational grammar. The purpose of this paper is to formalize this notion of transformational grammar and to study the expressive power of these grammars. *This work was supported in part by the 1965 and 1968 in Mathematical Linguistics, sponsored by the Center for havioral Sciences, Stanford, California. $Correspondence to this author at: Dept. of Linguistics, Texas 78712. SSupported in part by National Science Foundation Grant
Advanced Research Seminars Advanced Studies in the BeUniversity of Texas, Austin, NSFGP185 1.
0 American Elsevier Publishing Company, Inc., 1973
50
P. STANLEY
PETERS,
JR., AND R. W. RITCHIE
In particular, we relate the languages generated by these grammars to classes of languages studied in recursive function theory. The paper is arranged as follows: Section 1 is an informal discussion of the nature of grammatical transformations and the manner in which they operate on phrase-markers. This material will be familiar to linguists. Section 2 merely makes precise the concepts introduced in Sec. 1 with one difference. In informal discussion, phrase-markers are represented as trees to aid the reader’s intuitions, but in Sec. 2 they are represented as labeled bracketings for technical convenience in later sections. Section 3 merely recaps the definitions of phrase structure grammars, with emphasis on the manner in which they generate sets of phrase-markers. Section 4 defines a transformational grammar to contain two components: a base component (consisting of a phrase structure grammar) and a transformational component (consisting of a finite ordered set of grammatical transformations). Furthermore, transformations are defined to apply cyclically in derivations converting step by step a phrase-marker generated by the base into a derived phrase-marker. If the latter contains no occurrences of a special sentence boundary symbol, it is a surface structure and the phrase-marker initiating the derivation is a deep structure underlying it. A transformational grammar then generates as its language the set of all strings which have a surface phrasemarker. With these definitions as background, we prove in Sec. 5 that every recursively enumerable set of strings is the language generated by some transformational grammar. In Sec. 6 we examine the sets of languages generated by restricted types of transformational grammar and prove that the complexity of the language generated by a transformational grammar is no greater than the complexity of computation of the length of an underlying deep structure from a sentence. Section 7 is devoted to discussing some implications of these results for natural language in light of empirical studies linguists have made of a variety of languages. Empirical support is given for the hypothesis that natural languages are recursive. The reader whose interest is primarily in the results of Sets. 5, 6, or 7 is encouraged to proceed directly to these sections after reading Sec. 1. Sections 5 and 6, which require properties of transformational grammars detailed in Sets. 2-4, begin with summaries of the relevant properties. The properties summarized at the beginning of Sec. 5 follow immediately from the definitions, while those of Sec. 6 are deduced at the end of that section. 1. TRANSFORMATIONS: INFORMAL DEVELOPMENT As is usual in the formal study of grammars, we consider a language to be a set of fmite strings over a vocabulary of terminal symbols, i.e. given a finite
GENERATIVE
POWER OF TRANSFORMATlONAL
51
GRAMMARS
nonempty set VT (the terminal vocabulary) we may form the set Vt of all finite sequences of members of VT . Then a language is any subset of Vt . Phrase structure and transformational grammars also refer to another vocabulary of symbols, the nonterminal vocabulary VN of phrase types or grammatical categories. These grammatical categories appear in phrase-markers of strings in V:, which represent their segmentation into phrases and the classification of these phrases into types. A phrase-marker may be represented as a tree in which the leaves are labeled with members of VT and the other nodes with members of VN . The sequence of leaves dominated by a node labeled with a nonterminal symbol A is a phrase of type A. Alternatively, the same information can be represented by a well-formed labeled bracketing (cf. Defs. 2.1 and 2.11). As an example of a phrase-marker, assume that we are given the nonterminal vocabulary VN = {S, NP, VP, N, A} and the terminal vocabulary VT = {they, are, flying, planes} and consider the tree (1).
NY\
01
//.i//
they
are
p\ Pi K
flying
Phrase-marker (1) represents the information
planes
that, for example, flying
planes is a member of the grammatical category NP, as is they. On the other hand are flying is not a phrase of any type according to (1).
Transformational rules are mappings of phrase-markers into phrase-markers (cf. Def. 2.14). Each such rule consists of two parts: a structural condition and a set of elementary transformations (cf. Defs. 2.8, 2.10, and 2.12). The structural condition of a transformation serves to determine whether or not the rule will apply to a given phrase-marker and, if so, how to factor the phrase-marker into sections to be rearranged, duplicated or deleted. These effects are achieved by application of elementary transformations to factors of the phrase-marker. In order to be a transformation, a paired structural condition and set of elementary transformations must meet conditions of compatibility, chief among them the condition of recoverability of deletions (cf. Def. 2.13). A factorization of a phrase-marker is induced by a factorization of its terminal string in the
52 following way.
P. STANLEY
PETERS,
JR., AND R. W. RITCHIE
Consider the factorization of the terminal string they are flying planes
(2)
into the four substrings Xr = they, X, = are, X3 = flying, and X, = pIanes. This induces the division of (1) into factors as indicated in (3) (cf. Def. 2.6). The factors are given in (4).
NA /c/c;:+N, they
a’ are
.’ flying
planes
:
_
N
A )i
flying _ II
I
planes
Notice that each tree factor is chosen so as to include the highest node dom~ating only terminal symbols in the corresponding string factor and that nodes which dominate two or more string factors do not appear in any tree factor. (cf. Def. 2.5 for corresponding concepts in terms of labeled bracketings.) The factorization X1, X,X,,X4 of (2) into three terms induces the factorization of (1) indicated in (5).
‘\h
I
planes
GENERATIVE
POWER OF TRANSFORMATIONAL
53
GRAMMARS
The first and last factors are the same as before but the second factor is (6) which is not a subtree of (1) but a forest of adjacent subtrees.
A
06
are
I
P1yi.n .
Such forests will arise not only as a single factor of a factorization but also as what we shall call a sequence of factors. The sequence of ith-jth factors of a tree factorization is defined to be the ith factor of the tree factorization induced by concatenating the ith through jth string factors (cf. Def. 2.7 of “contents,” the corresponding notion on labeled bracketing). For example, (6) is the sequence of 2nd-3rd factors of (3) and (7) is the sequence of 2nd4th factors of (3).
(?J
/h
are
flying
pl
es
A structural condition will specify the properties a factorization of a tree must have if a transformation is to operate on it. These properties are expressed by employing three sorts of predicate: One sort specifies that a particular sequence of factors has a phrase of a certain type as its terminal string, another sort specifies that two sequences of factors are identical, and a third sort specifies that a sequence of factors possesses a certain terminal string. Each predicate is true only of factorizations with a specified number of terms and deals with particular sequences of these terms. (a) For every nonterminal symbol A the predicate Ay+j is true of a factorization if and only if 1 < i < j < n, the factorization has n terms, and a node
54
P. STANLEY
PETERS,
JR., AND
R. W. RITCHIE
labeled A dominates the terminal string of the sequence of ith-jth factors. (b) The predicate h + i f” j -+ k is true of a factorization if and only if 1 < h G i ,< n, i Q j =Gk < n, the factorization has n terms, and the sequence of hth-ith factors is identical to the sequence ofjth-kth factors. (c) For every string x of terminal symbols, the predicate i + j s” x is true of a factorization if and only if 1 2) such that $ = J/r . lclz . . . . . I), . If 1 = 1, the result is true trivially, so considering 1 > 1, assume the result for all 1’ < 1, and let $ be a reduced well-formed labeled bracketing without exterior brackets whose debracketization has length 1. It must be of the form JIi . $a . . . . . tin, n > 2, where each Jli is a reduced well-formed labeled bracketing whose debracketization is a string of length li < 1. By induction, each $i has length at most 2q(2li - 1) + li (it has the form [A, . . . [Ak $:]A,. . . ]A~ where k Q q and J/i is without ex-
terior brackets).
Hence, the length of $J is at most 2
2q(21- n) + 1 Q 2q(21- 2) + 1 as desired.’ m
2q(2li - 1) + li=
i=l
The interior of a terminal labeled bracketing will be, roughly speaking, the longest well-formed substring of the labeled bracketing which contains all the terminals in this labeled bracketing, if such a substring exists. We also speak of the residue as the (left and right) exterior. Precisely, we have: 1We note that this bound can be achieved whenever the length of upis a power of 2. For example, if upis ab, then
[A, . . . [A~~A~...[A~~IA~...IA~
hl...hqblAq..,lAI
is a reduced well-formed labeled bracketing of the desired length.
lAq...lAI
GENERATIVE
POWER OF TRANSFORMATIONAL
GRAMMARS
51
Defbition 2.5. The interior of a terminal labeled bracketing cp[written Z(q)] is the longest well-formed labeled bracketing $ such that
(i) d(q) = d(G), and (ii) there are labeled bracketings u, T such that cp= a@r if such a J/ exists. We shall call u the Zeftexterior of cp[written El(q)] and r the right exterior of cp[E,(lp)]. If there is no such IJ, we leave I(q), El(~) and E,(q) undefined. For example, the interior of (10) is (10) itself. The interior of (12) is “are,” but (13) has no interior. The left and right exteriors of (12) are “[v” and “I N” respectively, while both exteriors of (10) are null. Definition 2.6. A standard factorization into n terms, for n > 1, of a terminal labeled bracketing cp is defined, if cpis a substring of a well-formed labeled bracketing, to be an (ordered) n-tuple (J/r, . . . , 9,) of labeled bracketings such that
(i) q=$r . ..Gn.and (ii) for eachi= l,..., n, the leftmost symbol of Jli is not a right bracket, nor is the rightmost symbol a left bracket. The second condition assures us that the factors have been chosen to coincide with the phrase breaks, and that each non-null factors contains terminals. The conditions are necessary for the correct assignment of derived constituent structure by the transformations defined below. As an example, let us consider the standard factorization of labeled bracketing (8) given as follows: ($1, $2, $3, $4) where
(14)
$1 = $2
[S[NP[Ntb’]N]Np
= tVpa=,
$3 = [NPIAflying]Ay $4
=
and
[NP[N@dN]NP]VP]S.
Other standard factorizations are (1%
(J/r,
$243,
(16)
($1,
$2J,3$4),where
$41,
and Jli
are
asin(i4).
Note that (14) is the same factorization as (3), (15) is the same as (5) and the second factor, G2 J/a, in (15) has no interior, even though the entire string 9 114~I)~ 14~does, as does each $i individually.
P. STANLEY
58
PETERS,
JR., AND
R. W. RITCHIE
Only standard factorizations of labeled bracketmgs will be employed below. Therefore, we shall henceforth omit the word “standard” and call these simply “factorizations.” Definition 2.7. The contents C(p) of a terminal labeled bracketing cpis defined if and only if cp is a substring of a well-formed labeled bracketing, the leftmost -symbol of cp is not in R and the rightmost is not in L to be the concatenation of the interiors of the terms of the unique factorization (J/ 1, . . . , J/,) of cpsuch that
(i) each $i has an interior, and (ii) for any factorization (or, . . . , wk) of cp in which each term has an interior, each ILi is a product of adjacent oi’s; i.e. there are pc , . . . , p,, such we have that 0 = p. < p1 < . * -lI
The reader interested in examples of transformational mappings at this point may refer to the end of this section. The “is a” relation between strings and trees or appropriate linguistic structures was used in Ref. 1, p. 84, and pp. 142-3. For us, the “is a” relation takes the form of a relation between labeled bracketings and nonterminals. Definition 2.11. For a labeled bracketing cp and a nonterminal symbol B we say that cpis a B if there are labeled bracketings $, w, cr, r such that 4Chomsky considers elementary transformations to be applied in sequence rather than simultaneously. We have discussed our formulation with him and believe that the transformational mappings we allow are the same as those he desires to have available.
GENERATIVE
POWER OF TRANSFORMATIONAL
GRAMMARS
61
(i) J/ and w are well-formed labeled bracketings and u E L* and r E R* , (ii) cp= ~$7, and
(iii) $= [nw]u. By this definition each of examples (8) and (9) is an S, but (2) is not. Also, (10) is both an N and an NE. Although this definition is not the more familiar linguistic notion “is a,” the usual notion is easily recaptured as follows. If x is a string of terminals, B is a nonterminal and cpis a labeled bracketing, then we may say that x is a B in cp if there are labeled bracketings $, (I, and r such that cp= (~$7, x = d(4) and + is a B. Definition 2.12. (I) For each nonterminal B and all integers h, i and n, the predicate B$+ holds of the factorization (J/i, . . . , 4,) if
(i) l 3 for ‘elementary” or “primitive, ” where &” is defined in Ref 5.)
Proof. In Ref. 12, p. 148 it was shown that the class of elementary recursive
functions of Csillag-Kalmar (see Refs. 5,8, p. 76, or 6, Ex. 1, Sec. 57, p. 285, for a definition) can be characterized as the class of predictably computable functions; i.e. u
Ft. Here FO is the class of functions computable on finite autom-
i=l
14We shall assume that the domain offg has been encoded in a suitable fashion into the non-negative integers so that we may speak of recursiveness-the p-adic encoding is a natural one, for example.
GENERATIVEPOWEROFTRANSFORMATIONALGRAMMARS
77
ata and Fi +, is the class of all functions f computable on Turing machines for which there is a g in Fi such that, for each x, g(x) is an upper bound on the storage used to compute f(x). It was shown in Ref. 12 that, if f(x) is in Fi, then for any constant C, Cf(*) is in Ft+l as is Cf(*)l(x). Hence, if the cycling function ~GJ(x) for a grammar 9 is bounded by a function in Fi, then the bound Cc’5 (X)Z(x) on tape squares used to decide membership of x in L(s) provided in Theorem 6.4 is in Fi+z, and thus the characteristic function of t(g) is in Fi+s (i.e. membership in L(g) is decidable in class Fi+3). It was implicit in Ref. 12 and noted explicitly in Ref. 4 that if the tape bound for a computation is in $“, for any n 2 3, then so is the function computed; hence the result for primitive recursive functions also follows since the primitive recursive functions are fi gn.
9
It is worth noting that Corollary 6.7 unlike Corollary 6.6 does not establish the equivalence of elementary recursiveness of fg and of L(Q). The question of whether or not a grammar for an elementary (predictably computable) language can have very complex deep structure with “unpredictable” nesting in the sense that the cycling function is not predictably computable is an interesting one which the authors have not studied. A positive answer would suggest that decisions of grammaticality can be made in ways far “simpler” than the resurrection of an entire deep structure, a finding with some psycholinguistic interest. We now conclude this section with the proofs of Lemmas 6.1, 6.2 and 6.3. It will be convenient to also state and prove explicitly the result used in the proof of Theorem 6.4 that the alphabet on which a Turing machine operates may be modified with little effect on the storage needed. LEMMA 6.8. Let Z be a Turing machine on the alphabet VI, let V, be a subset of VI containing at least two symbols and let c be the number of symbols in VI - V,. There is a Turing machine 2’ on V, such that, for every string q E V,*, Z’ accepts (rejects) cp if and only if Z accepts (rejects) cp,and further Z’ uses exactly c + 1 times the amount of tape used by Z on input up.
Proof Let bI and b2 be distinct elements of V,. We shall encode Vt into V,* as follows. Each string of V; of length 1 is replaced by a string of V,’ of length (c t 1) I; namely each symbol a of Va is replaced by ab:, and the kth symbol of VI - Vz is replaced by b:-k b 1bi. The machine Z’ operates as follows. On an input q in V,* it replaces ‘pby its encoded version, multiplying the
length of the tape by c + 1, then operates on the encoded version exactly as Z does (interpreting each string of c + 1 symbols as a single symbol of V,). m Proof of Lemma 6.1. We shall construct Z on the alphabet VT U L U R U {A} U VN U (0, 1,2, . _ . , j} and appeal to Lemma 6.8. Upon input of a string cp,Z checks that cpis in (Vr U L U I?)*, and if so sets up the tape as follows:
78
P. STANLEY
AcpA#S#A... -y-I
PETERS,
JR., AND R. W. RITCHIE
A IO’-’ A IO'-'A.. . AlO’-’ A 1 1 1
with (p + CJ)Icopies of 101-’ A, where p and 4 are the cardinalities of VT and VN respectively and I = Z(p). In the section of the tape initially containing #S#, 2 carries out a strong derivation in accordance with the rules of 9. The (p + 4)’ rightmost sections of the tape, each of length I, which will be referred to as “counters,” are used to specify, at each step of the derivation, that rule to be applied and the position at which to apply it, and also to assure that no possible derivation is overlooked. Since without loss of generality we may require that no line of a derivation be repeated, since there are fewer than k’ distinct strings of length at most I- 2 over an alphabet of k symbols, and since the outermost two symbols ignoring #‘s of each line but the first in our derivation are [S and] s, we see that if there is a derivation of cp,then theremust be one with less than (p + 4)l steps. Specification of a derivation is accomplished as follows. The ith counter specifies the action taken at the ith step; it will contain a 0 on each of its 1 squares except one, say the mth which will contain an integer n, 1 Q n < j. This setting of the ith counter requires the nth rule of 9 to be applied to the mth symbol of the string in this ith step. If that is not possible (if the string at this step is less than m symbols long, if its mth symbol is not the nonterminal rewritten by the nth rule, or if the context of the nth rule is not satisfied at this position or if the result would be a string more than I symbols long) then the string at this ith step is compared with cp. If they are equal, the computation terminates and cp is accepted; if not, then this attempted derivation fails and the next attempt is begun by restoring S on the second section of the tape and by passing to the next arrangement of the counters. If all arrangements have been tried, cpis rejected. We leave to the reader the specification of a systematic procedure for running through all the (p + 4)‘. 1. j possible arrangements of the counters so that every possible derivation of length (p + q)l or less is attempted. The tape used in this computation is actually [(p + s)’ t 21 (It 1) t 3 tape squares. Reducing this to the alphabet VT U L U R U {A} by Lemma 6.8, the Turing machine Zr uses (4 t j + 2) times this many tape squares. Hence, setting K1 equal to, for example (p + 4 + j)’ establishes the desired result. n Proof of Lemma 6.2. We appeal to Lemma 6.8 and add 0 to the alphabet before constructing Z. Given (x, cp,#‘), Z rejects the input if l(p) > I or if9 is not well-formed; otherwise it sets up the tape as follows: Ax A(p,aA”-’
cpO’+‘)A
A”-’
O’A _ . . AA”-’
(kY 1)s
O’A,
GENERATIVEF-OWEROFTRANSFORMATIONALGRAMMARS
79
where k is the number of transformations of S, n is the maximum number of terms in any of these transformations, and s is the number of subsentences in 9. The leftmost two sections of this tape retain permanent copies of x and 9 while the rightmost (k + 1)s sections which we will call “counters” are used to carry out transformational derivations from 9, Since each line in any derivation has length less than 1, and since the transformation applied to any line analyzes it into at most n terms which we can record by the insertion of n - I or fewer A’s into the line, each counter can hold a line of a derivation factored into the terms appropriate to the next transformation to be applied. Since there can be only s cycles of transformations each of k t 1 steps, the tape is sufficient to hold an arbitrary transformational derivation from 9 each line of which is to have length less than 1. The ith counter is used to specify the factorization of the line obtained at the ith step; and each counter is set by inserting n - 1 A’s into a string of I 0’s. We have indicated the insertion of these n - 1 A’s in the leftmost spaces in each counter and the duplication of 9 following these in the first counter in our description of the tape we set up. Each time a derivation is unsuccessful, the next position of the counters is obtained and a new derivation is attempted. We leave to the reader the specification of a systematic procedure for moving the (k t 1)s (n - 1) A’s in the counters through all possible positions so that all possible factorizations are attempted at each step of a derivation. To describe the manner in which a derivation is carried out, let us describe an arbitrary step in such a derivation. Assume that the transformations are . , nk-term transformations respectively, ni < n. T . . . ) Tk,andtheyareni,.. C&.ider the [(k t 1) (p - 1) + 41 th step, 1 f p < s, 0 < (I < k, and let the [(k + 1) (p - 1) + 41 th line of the derivation be rj. If 4 = 0, simply place p(JI) into the spaces occupied by O’s in the next counter, skipping spaces occupied by A’s. If q > 1, let the contents of the counter be x1 o 1ho2 . . . w, _ rAti,,, x2 is the pth subsentence of $, and where the rewhere w = wlw2 . . . w,_,w~ maining A’s, if any, are in the x’s. (More precisely, x2 contains exactly s - p occurrences of ] s .) w,) is a proper analysis of w for Tq, then apply Tq to If (WI,..., . , w,) and write x)ITq(wl, . . . , w,)x; in the spaces occupied by O’s (Wl, f. in the next counter, where xi is the result of deleting A’s from xi. If the result is more than 1 symbols long, reset the counters and try the next derivation. If om) is not a proper analysis of w for Tq , write x’rw& into the spaces (at,..., holding O’sin the next counter. In this process, when a result appears in the (k t 1)sth counter, its debracketization should be compared with x and if different, the counters should be reset and the next derivation tried. If equal, then we have carried out a transformational derivation of x unless at some step in which the factorization was not a proper analysis there was some other factorization which could be a proper
80
P. STANLEY
PETERS,
JR., AND R. W. RITCHIE
analysis. Recall that our transformations are obligatory, and if any analysis at the [(k t 1) (p - 1) + 41 th step is proper, then the qth transformation must apply there. I5 Hence if x is obtained at the end of a purported derivation, we then go back throughkach step at which the analysis was improper, and try all other arrangements of the n - 1 A’s. If some arrangement yields a proper analysis, we reject this purported derivation, reset the counters and try the next derivation. If there is no proper analysis, we reset the A’s in this one counter, and try the next counter at which the analysis was improper. If no counter holding an improper analysis can be reset in this way to yield a proper analysis, then the input tape A x A cpA #‘A is accepted. The number of tape squares used is 2[1(x) + l(q) + (k + l)s(l + n) + 31 and since I(x), l(q) and s are each less than I we can obtain the desired bound by setting K2 equal to, for example, 4(k + 1)~. n Proof of Lemma 6.3. We shall show that K3 may be taken to be [n(4q t 1) . (c + I)] ‘+’ where k is the number of transformations in 7, n is the maximum number of terms in any of the k transformations in Y, 4 is the number of non-
terminals in Y, and c is length of the longest terminal stringy mentioned in the structural condition of any transformation in Ff. Consider the step of the transformational derivation taking Cpito Cpi+l. We desire an upper bound on I(qi)/l(Cpi+,); since these labeled bracketings are reduced, it will suffice to bound the ratio of the debracketizations. Letting I be the length of d(pi+,), we now show that the length of d(gi) can be at most n(/ + c). The only elementary transformations which can shorten Cpiare deletion and substitution, and the condition of recoverability of deletions requires that each application of these operations occur only when there is an associated condition either of the form (i) in Def. 2.13 or of the form (ii) in that definition. The greatest number of terminals that can be erased from any term of the proper analysis of cpi by a deletion or substitution is thus less than 1 t c; 1 if the condition is of type (i) since then a copy (of length at most 1) is left in pi+1 , c if the condition is of the other form. Since at most one operation can be applied to each of the at most n terms, the length of d(cpi) is less than n(Z + c). Since vi is reduced its length is less than 2q [2n(l + c) - I] + n(l + c) by Lemma 2.4. Since 1 > 1, Z(qi) < n(4q t 1) (c + 1)1. Since I is less than I(~pi+~),we have @I& < n(4q + 1) (c + l)l(pi+l), for each i = 1,. . . , t - 1, SO that /((pi) < [n(4q t I) (c t l)] I-i (4q t l)Z(x), where the right factors are obtained by
lSThroughout the development, we could have generalized the notion of transformation to include optional transformations by allowing an optional/obligatory distinction to be specified for each transformation without changing our results. In this proof, we would only check for transformations marked obligatory that there is no proper analysis by the method described immediately below.
GENERATIVE
POWER
OF TRANSFORMATIONAL
GRAMMARS
81
Lemma 2.4 using x = d(cp,). Finally we note that t is at most (k + l)s, and this gives the desired result &J$ < {[n(4q + 1) (c + l)] k+l )” I(x). l 7. SOME CONCLUDING REMARKS We have seen that the ability of transformational grammars to generate nonrecursive languages, nonprimitive recursive languages, nonelementary languages, etc. resides in the fact that very short sentences may have very large numbers of cycles in their derivations, and thus a great amount of deletion may take place in the transformational derivation even though it is all “recoverable.” Thus Corollaries 6.6 and 6.7 show that any restriction which limits the number of subsentences in the deep phrase-markers of strings generated by a transformational grammar can be interpreted as a stronger condition of recoverability of deletions. Available transformational grammars of natural languages do not make use of the power to take enormous numbers of cycles in the derivation of very short sentences. In fact, it appears that for every transformational grammar 3 written for a natural language there is a constant k such that the function I?(X) bounds & . Since Icrtx) is an elementary function the language L(9) is elementary by Corollary 6.7; it is even in F 3, as can be seen from the proof of this corollary since 21cx)is in Fe. These observations suggest that an appropriate line of research for the discovery of a more adequate condition of recoverability of deletions would be to search for empirically supportable restrictions on transformational grammars which would guarantee that the cycling function of such grammars be bounded by an exponential or polynomial function. This would become especially interesting if the length of the deep phrase-marker were linear in the terminal string x. Then we would know that the languages generated by these grammars were context-sensitive since this restriction would permit checking of base and transformational components to be done nondeterministically in linearly bounded storage. We relate our results to some remarks and proposals of Putnam [Ref. 111. There he noted that every recursively enumerable language is generated by a transformational grammar and made several suggestions for conditions which would restrict the transformational languages to being recursive. We will return to his reasons for desiring such restrictions. He suggested two conditions (Ref. 11, p. 42) (i) that the transformational rules be made “cut-free” in the sense that the output of a transformation never be shorter than its input and (ii) that there be constants nl and n2 for each transformational grammar such that at most nr terminals can be deleted by any transformation and at most n2 deletion transformations can be applied in any derivation. Empirical considerations clearly rule out both of these as restrictions on the definition of a transformational grammar. Noting this, Putnam proposed that
82
P. STANLEY
PETERS,
JR., AND R. W. RITCHIE
the class of transformational grammars be defined so that they satisfy a “cutelimination” theorem. We can interpret this rather broadly to mean that for every grammar 9, in the class there is another grammar gz such that (i) L(sr) = L(gl) and (ii) th ere is a constant k with the property that for every x E L(!&) there is a deep phrase-marker cpunderlying x with respect to gz such that I[d(p)] < M(x). We now see that any grammar satisfying such a cutelimination theorem generates a language which more than being recursive is context sensitive. This is so because a nondetermlnistic linear bounded automaton can determine both that a labeled bracketing cp is strongly generated by a context sensitive grammar and that it underlies a given string x if the automaton has enough tape to write cp(since the Ccs’(x) sections of the tape in the proof of Theorem 6.4 are used only to check deterministically all possibilities, and hence are dispensable in nondeterministic operation.) However, we have no way of settling the question whether grammars of natural languages satisfy a cut-elimination theorem. Thus, let us return to the point discussed at the end of Sec. 5, where we concerned ourselves with the question whether all natural languages are recursive. Putnam offers an argument (Ref. 11, pp. 39-41) that natural languages are recursive. His argument involves several highly debatable assumptions and in addition is in reality an argument that the set of sentences of a natural language acceptable to a speaker under performance conditions is recursive rather than an argument about the set of sentences specified as grammatical by the speaker’s competence (Ref. 1, pp. 3-4, 10-15). We are able to circumvent these difficulties and offer a new argument based on empirical research in linguistics. There has been a great deal of work describing the competence of native speakers of a variety of natural languages by transformational grammars. As we have noted, all these grammars seem to have exponentially bounded cycling functions. Thus, if one makes the empirically falsifiable assumptions (a) that every natural language has a descriptively adequate transformational grammar, and (b) that the languages investigated so far are typical as regards the computational complexity of their cycling functions, then it follows that the set of grammatical sentences of every natural language is recursive, in fact predictably computable and in F3 at worst. There is a great deal of empirical evidence to support assumption (a) and we see no reason to doubt (b); thus we feel that this argument is empirically well supported. It provides strong justification for our feelings expressed at the end of Sec. 5 that recoverability of deletions should restrict natural languages to being recursive. It is worthy of note that the assumptions of this argument are not philosophical but empirical in nature. Thus we can justify the intuition of virtually all linguists that natural languages are recursive. This provides motivation for the desire, as seen for example in (Ref. 1, footnote 37, p. 208), of transformational linguists to restrict de-
GENERATIVE
POWER OF TRANSFORMATIONAL
GRAMMARS
83
letions so that transformational languages are recursive. Although we have shown that the restrictions currently imposed on deletions do not accomplish this, our results provide guidance for research into this problem. REFERENCES 1. Noam Chomsky,Aspecrs offhe Theory of Syntax, M.I.T. Press, Cambridge (1965). 2. Noam Chomsky, Current Issues in Linguistic Theory, Mouton, The Hague (1964). 3. Noam Chomsky, On certain formal properties of grammar, Information and Control 2, 137-167 (1959). 4. Alan Cobham, The intrinsic computational difficulty of functions, Logic,MethodoIogy and Philosophy of Science (Proc. 1964 Internat. Congr.), North-Holland, Amsterdam
(1965) pp. 24-30. 5. Andrej Grzegorczyk, Some classes of recursive functions, Rozprawy Matemafyczne, Warsaw (1953). 6. S. C. Kleene, Introduction to Metamathematics. Van Nostrand, Princeton, N.J. (1952). 7. S. Y. Kuroda, Classes of languages and linear-bounded automata, Information and Control 7,207-223 (1964). 8. Rozsa Peter,Rekursive Funktionen,
Akademia Kiado, Budapest (195 1). 9. Stanley Peters, A note on the equivalence of ordered and unordered grammars,Harvard Computation Laboratory Report to NSF, No. I7 (1966). 10. Stanley Peters, and R. W. Ritchie, On restricting the base component of transformational grammars, Information and Control 18,483-501 (1971). 11. Hilary Putnam, Some issues in the theory of grammar, The Structure of Language and Its Mathematical Aspect (Roman Jakobson, Ed.), American Mathematical Society, Providence, RI. (1961). 12. R. W. Ritchie, Classes of predictably computable functions, Trans. Amer. Math. Sot. 106, 139-173 (1963). 13. John R. Ross, A proposed rule of tree pruning, Harvard Computation Laboratory Report
to NSF, No. I7 (1966).
Received May, I969;
revised version received April, 1971