The Power of Constraint Grammars Revisited

The Power of Constraint Grammars Revisited Anssi Yli-Jyrä University of Helsinki, Finland [email protected]

arXiv:1707.05115v1 [cs.FL] 17 Jul 2017

Abstract Sequential Constraint Grammar (SCG) (Karlsson, 1990) and its extensions have lacked clear connections to formal language theory. The purpose of this article is to lay a foundation for these connections by simplifying the definition of strings processed by the grammar and by showing that Nonmonotonic SCG is undecidable and that derivations similar to the Generative Phonology exist. The current investigations propose resource bounds that restrict the generative power of SCG to a subset of context sensitive languages and present a strong finite-state condition for grammars as wholes. We show that a grammar is equivalent to a finite-state transducer if it is implemented with a Turing machine that runs in o(n log n) time. This condition opens new finite-state hypotheses and avenues for deeper analysis of SCG instances in the way inspired by Finite-State Phonology.

1

Introduction

Lindberg and Eineborg (1998), Lager and Nivre (2001) and Listenmaa (2016) have analyzed the Sequential Constraint Grammar (SCG) (Karlsson, 1990) from the logical point of view, proposing that the rules can be expressed in first-order Horn clauses, first-order predicate logic or propositional logic. However, many first-order logical formalisms are themselves quite expressive as Horn-clauses are only semi-decidable and firstorder logic is undecidable, thus at least as powerful as SCG itself. Propositional logic is more restricted but does not help us to analyse the expressive power of SCGs and to prove the finitestateness of grammars. Instead of just reducing SCG to undecidable

or otherwise powerful formalisms, we are interested in the ultimate challenge that tries to prove that a practical grammar is actually reducible to a strictly weaker formalism. This goal is interesting because this kind of narrowing reductions have been proven extremely valuable. For example, the proof that practical grammars in Generative Phonology are actually equivalent to finite-state transducers has turned out to be a game-changing result. In fact, the reduction gave birth to the influential field of Finite-State Phonology. It is noteworthy that prior efforts to analyse SCG in finite-state terms have focused on the finite-state nature of individual and parallel rules (Peltonen, 2011; Hulden, 2011; Yli-Jyrä, 2011). The efforts have mostly ignored the generative power of the grammar system as a whole and that of practical grammar instances. In this paper, we are aiming to Finite-State Syntax through reductions of practical SCGs. To set the formal framework, we have to start, however, from the total opposite: we show first that the simplified formalism for Nonmonotonic SCGs is Turing equivalent and thus similar to Generative Phonology (Chomsky and Halle, 1968; Ristad, 1990) and Transformational Grammar (Chomsky, 1965; Peters and Ritchie, 1973). This foundational result gives access to the large body of literature of bounded Turing machines and especially to Hennie machines that run in O(n) time and are equivalent o finite-state machines. Then the Gap Theorem (Trakhtenbrot, 1964) gives us access to a looser bound o(n log n) whose reasonable approximations are sufficient and decidable conditions for finite-state equivalence. We present some ways in which these bounds can be related to SCG parsing. The article is structured as follows. Section 2 describes the alphabets, the strings and the derivation steps in SCG parsing. In Section 3, these are used to show Turing equivalence of SCGs. In next two sections, simple bounds are introduced and

elaborated further to obtain specific conditions for finite-state equivalence of grammars. Further links to formal language theory and two important open problems are presented in Section 6. Then the paper is concluded.

2

SCG as a ”Phonological” Grammar

In the SCG literature, morphosyntactic readings of tokens are usually represented as tag strings like "" "go" V PAST. The tag strings are now viewed as a compressed representation for a huge binary vector ( f0 , f1 , f2 , ..., fk , ...). The semantics of the grammar ignores some tags and considers only k tags declared in advance in the grammar. These k tags or features distinguish readings from each other and define the reading alphabet Σ = 2k . An ambiguous token has more than one reading associated to it. The elements of the cohort alphabet P(Σ) are called cohorts. This alphabet is the powerset of the reading alphabet. Only a small subset of all possible cohorts occur in practice. The input of an SCG is produced by a deterministic finite-state function, Lexicon∗ : T ∗ → (P(Σ))∗ , that maps token strings to lexical cohort strings of the same length. This function is the concatenation closure of the function Lexicon : T → (P(Σ)) that maps every token to a cohort. Since the image of each token is a set of strings, Lexicon is internally a nondeterministic lexical transducer (Karttunen, 1994; Chanod and Tapanainen, 1995), but the image of each token is viewed externally as a symbol in P(Σ), making Lexicon a one-valued function. An SCG processes the lexical cohort string by iterated application of a derivation step ⇒: (P(Σ))∗ → (P(Σ))∗ that affects one cohort at a time. The contexts conditions of each derivation step are normally defined using an existing SCG formalism for contextual tests. Monadic Second Order Logic (Büchi, 1960; Elgot, 1961; Trakhtenbrot, 1961) provides an alternative formalism that can express all finite state languages over P(Σ). The parser defines the parsing strategy that resolves the conflicts between rules that could be applied simultaneously. A typical strategy chooses always the most reliable rule and the leftmost target position. When the plain contextual tests are combined with the application strategy, we obtain a total functional transducer (Skut et al., 2004; ?; ?). E.g., the transducer in Fig. 1 is total and re-

��

��

�

Figure 1: A simple ⇒ relation as an FST places A by B in the first possible occurrence position. The semantics of an SCG grammar G is defined as the relation [[G]] = {(i, o) | i ∈ P(Σ)∗ , o ∈ I, i ⇒∗ o} where I ⊆ P(Σ)∗ as {x | (x, x) ∈⇒}. This semantics makes SCG grammars similar to grammars in Generative Phonology (Chomsky and Halle, 1968) as both grammars relate the lexical string into some kind of output string by applying a sequence of alternation rules.

3

Nonmonotonic SCG

Two recent SCG implementations (Tapanainen, 1996; Didriksen, 2017) are nonmonotonic: they do not always reduce the input but they can insert tags, readings and even cohorts. In this section, we study the expressive power of such SCGs. 3.1

Minimal Definition

For the sake of minimality, we define the Nonmonotonic SCG (NM-SCG) as a rule system that supports the following kinds of local transformation rules: • REPLACE (old ) (new ) (cond )+ • INSCOHORT (targ ) (cond )+ • REMCOHORT (targ ) (cond )+ The first rule template in the above replaces the leftmost cohort containing the reading old with a cohort that contains the reading new if the relative context condition cond is satisfied. The familiar SELECT and DELETE rules are seen as shorthands for sets of REPLACE rules. The second and the third rule templates are used to insert or remove a target cohort matching the pattern targ when the condition cond is satisfied. The plus (+ ) indicates that more than one condition can be present. Our simplified context conditions are of the form (d tags ) or (d NOT tags ) where the first tests the presence of the pattern tags in the relative cohort location d. The second is true when the location does not contain the pattern.

3.2

One-Tape Turing Machine

A one-tape deterministic Turing machine (TM) has a finite control unit and an infinite rewritable tape with a pointer (Fig. 2). A configuration of the machine consists of the current state q, the current pointer value and the contents of the working tape.

LB A B C A B C RB RB · · · q↑ Figure 2: A one-tape Turing machine The tape is divided into squares that hold a left boundary LB, a right boundary RB, or a symbol from the tape alphabet Ω. Given the input string x ∈ Ω∗ , the first square of the tape is pointed and the tape is initialized with the prefix LBxRB that is followed by an infinite number of right boundary symbols. The control unit is a deterministic finite automa(A,B,d) ton where each transition s → t specifies the source state s, the target state t, the input symbol A, the output symbol B, and a head move d ∈ {−1, 0, 1}. On each transition, the machine overwrites the symbol A in the pointed square with the symbol B, changes its state from s to t and then moves the pointer d steps to the right. All but the leftmost square are over-writable (B = A if A = LB), but the machine never moves beyond the first right boundary without overwriting it with a tape symbol and never writes RB between two tape symbols. The computation of the machine starts from state q0 . At each step, the machine takes the next transition based on the current state and the currently pointed symbol on the memory tape. The computation continues as long as the next transition is defined and then halts by reaching a state from which there is no transition on the current input. If the halting state is among the final states F, the machine accepts the input contents and relates it with the string x0 ∈ Ω stored to the memory tape. Otherwise, the machine either gets stuck to an infinite computation or gives up, leaving some ill-formed string to the memory tape. 3.3

Reduction to Nonmonotonic SCG

Now we show that any one-tape Turing machine can be simulated with a nonmonotonic SCG. In our simulation, each square in the initial portion of the memory tape corresponds to a cohort in

the input. Each cohort is a singleton set in P(Σ) i.e. represents just one reading in Σ. Each reading is a collection of positive features from Φ. These features include the tape symbols Ω, the boundary symbols {LB, RB}, and the markers that that we need to keep track of the computation steps. The pointed square corresponds to a cohort that contains a marker. Since SCG can change only one cohort at a time, movement of the pointer involves two temporarily marked positions and markers: the first indicates the previously pointed square and the second indicates the new pointed square. One marker represent the source state and the other represents the transition in progress. A,B,d

A transition q → r, RB ∈ / {A, B}, corresponds to a sequence of three rule applications that change one cohort at a time. Since the set of transitions, the sets of states Q and the tape alphabet Ω are finite, each step is described with a finite set of non-monotonic SCG rules: 1. Given the state marker Qq ∈ {Qs | s ∈ Q} ⊆ Φ in cohort i and no other marked cohorts, add a transition marker T-q-A ∈ Φ to cohort i + d that previously contains a tape symbol C ∈ Ω: REPLACE (C) (T-q-A C) (−d Qq A)

2. Given a transition marker T-q-A in cohort i+ d, overwrite, in cohort i, the reading containing the tape symbol A and the state marker Qq with a reading containing the tape symbol B: REPLACE (Qq A) (B) (d T-q-A)

3. When no state marker is present, replace the transition marker T-q-A with the marker for the target state Qr while keeping the remainder C ∈ Σ in the changed cohort: REPLACE (T-q-A C) (Qr C) (−d NOT Qq ) RB,A,0

A transition q → r, A ∈ Ω, corresponds to the application of rules: ADDCOHORT (T-q-RB A) (1 Qq RB) REPLACE (Qq RB) (RB) (-1 T-q-RB A) REPLACE (T-q-RB A) (Qr A) (1 NOT Qq RB)

When the previous cohort contains tape symbol A,RB,−1

C ∈ Ω, a transition q → r, where A ∈ Ω, corresponds to the application of rules: REPLACE (C) (T-q-A C) (1 Qq A) (2 RB) REMCOHORT (Qq A) (1 T-q-A) REPLACE (T-q-A C) (Qr C) (1 NOT Qq A)

RB,A,−1

RB,A,1

A,RB,0

Transitions q → r, q → r and q → r, A ∈ Ω, reduce to a sequence of two transitions. The SCG parser halts when the tape contents does not trigger any of these rules that simulate transitions. The simulation accepts the input if some cohort contains a marker Qq such that q ∈ F.

LBA computations can be initialized so that the space available for storing the cohort string is linearly bounded by the length of the initial cohort string.

Proposition 1. NM-SCGs can simulate TMs. Since NM-SCG is itself an algorithm, we have:

The power of DLBAs is restricted to a strict subset of context-sensitive languages (Kuroda, 1964).

Proposition 2. There is a one-tape deterministic TM that implements the NM-SCG parser.

Proposition 8. The cohort language accepted by a finite-fertility SCG is context sensitive.

Proposition 3. NM-SCGs are equivalent to TMs.

4

Bounded Nonmonotonic SCGs

The undecidability of Nonmonotonic SCG creates a need to restrict the formalism in ways that ensure decidability. In this section, we propose two parameters that set important bounds on the resources available to grammars. 4.1

The O(n) Space Bound

The fertility f ∈ N ∪ {∞} of a nonmonotonic SCG grammar is the maximum number of new cohorts that each the grammar inserts before any of the n cohorts in the original sentence (with RB). Note that fertility f > 0 implies nonmonotonicity. Proposition 4. In finite-fertility SCGs, the length ` of the output string is linearly bounded. The bounded length of the cohort string is an important restriction to Nonmonotonic SCGs because it ensures that any infinite loop in the computation can be detected after a bounded number of computation steps because the number of distinct tape contents is bounded. Proposition 5. The termination of a finite-fertility SCG is decidable. We also know that the preconditions of each rule can tested with a finite automaton and that the actual effect on the target cohort is a functional finite-state computation that can be implemented in linear space according to the length of the cohort string. Proposition 6. The space requirement of a finitefertility SCG is linear to the maximum length of the cohort string during the derivation. A deterministic linear-bounded automaton (DLBA) (Myhill, 1960) is a special case of Turing machines with the restriction that the right boundary is fixed and cannot be overwritten. The

Proposition 7. A nonmonotonic SCG with finite fertility is simulated by an DLBA.

4.2

The O(n2 ) Time Bound

By studying only monotonic SCGs with the reading count r in cohorts, and the sentence length n (including RB), Tapanainen (1999) has given a lower bound for the parsing time: Proposition 9 (Tapanainen 1999). Any monotonic SCG performs O(nr) rule applications. The volume v ∈ {1, 2, ...} ∪ {∞} of cohorts is a parameter that tells the maximum number of operations that can be applied to any cohort. This new notion is a nonmonotonic generalization of the maximum number of readings in one cohort. Finite volume basically turns every finite fertility SCG into a monotonic SCG. Finite fertility helps us to generalize the above proposition to nonmonotonic SCGs. Proposition 10. Any NM-SCG performs O((1 + f )nv) rule applications. Assuming again that any rule of the grammar can be applied in linear time according to the number of cohorts, we obtain a time complexity result: Proposition 11. Any NM-SCG runs in O((1 + f )2 n2 v) time.

5

Finite-State Hypotheses

A deterministic linear bounded automaton is a special case of one-tape deterministic Turing machines that gives us a context where many interesting conditions for finite-stateness aka regularity become applicable. 5.1

The o(n log n) Time Bound

Hennie (1965) showed that a deterministic onetape TM running in O(n) is equivalent to a finite automaton. By defining the relation between the initial and final tape contents, we can extend Hennie’s result to regular relations:

Proposition 12 (Hennie 1965). A one-tape deterministic TM running in O(n) time is equivalent to a functional finite-state transducer.

FT 150

The Borodin-Trakhtenbrot Gap Theorem (Trakhtenbrot, 1964) states that expanded resources do not always expand the set of computable functions. In other words, it is possible that O(n) is unnecessarily tight time bound for finite-state equivalence. A less tight time 9 bound is now expressed with the little-o notation: 25 200 t(n) ∈ o( f (n)) means that the upper bound f (n) Figure 6: Average running times of CG-2 in the Financial Times grows much faster than the running time t(n) Figure 3: Average running time of CG-2 in Finanwhen n tends to infinity: limn→∞ t(n)/ f (n) = 0. cial Times (according to Tapanainen 1999) seems The average running time is then plotted in Figures 4 to 6. In addition to Hartmanis (1968) and Trakhtenbrot (1964) to follow the curve log n) running time, there are fourO(n function curves: linear ( ), quadratic ( ), showed independently that the time resource ofthe a cubic ( ) and . I set a coefficient for all of these functions so that finite-state equivalent deterministic one-tape TM they go through the same point which denotes the sentence length 25. The time in Proposition the y axis is in milliseconds. 14. The time used to maintain the can be expanded from O(n) to o(n log n) without The running time curve from parsing the novels seems smooth in Figcontext conditions is dominated by the time used expanding the characterized languages. More reure 4, closely following the curve. Figure 5 shows the curves to move between target cohorts. for the short sentences in more detail. On the other hand, the curve of the cently, Tadaki et al. (2010) showed that the bound newspaper text in Figure 6 seems somewhat more complex. The newspao(n log n) applies also to nondeterministic oneNM-SCGs based on ainone-tape have now a the per obviously has a larger variation the runningTM time. Nevertheless, tape TMs that explore all accepting computations. time seems a reasonable approximation for the average running regularity condition: time of CG-2 in both cases. Proposition 13 (Tadaki et al. 2010). A one-tape Proposition 15. An NM-SCG is equivalent to a TM running in o(n log n) time is equivalent to 2.7.3 a Worst finitecase automaton/transducer if its one-tape TM imasymptotic running time of intersection grammars finite automaton/transducer. plementation runs in o(n log n) time. The theoretical worst case running time of the intersection grammars is discussed in Tapanainen (1997). There, I showed that this type of engine A sufficient condition for finite-state equivaThis proposition can be compared to an interestruns in linear time , where is the size of the combined compiled lence of a TM is satisfied if the running time of ingandempirical observation bycompiled Tapanainen (1999) augrammar is the size of the sentence into a finite-state The reports size is linear to the length of theasentence if theSCG amount of the machine is bounded by a function t(n) that tomaton. is who experiments with practical ambiguity that an individual token may get is limited. Paradoxically, Vouin o(n log n). For any reasonable function t(n), this system. to thecomputation experiments, the with tilainen(CG-2) (1998) reports that According due to the massive needed sufficient condition is decidable (Gajser, 2015). average running time of the system follows closely However, to decide finite-state equivalence of any the O(n log n) curve (Fig. 3). 16 TM, it would be necessary to consider all funcOn the basis of the experiments by Tapanainen, tions t(n) ∈ o(n log n). we cannot exclude the hypothesis that the asympWe will assume a one-tape TM implementation totic running time is actually in o(n log n). for finite-fertility SCGs. The tape is initialized in Whether the used grammar is actually equivalent such a way that f empty squares are reserved for to a finite-state transducer is not known. latent cohorts at every cohort boundary. If the given NM-SCG instance would be equivWe assume the representation of the grammar alent to a finite-state transducer, there would be rules and the related application strategy by a funca possibility to carry out monotonic SCG parsing tional transducer such as in Figure 1. Its optiin linear time and thus improve the parser’s effimization via the inward deterministic bimachine ciency considerably. In case that the transducer is constructions (Yli-Jyrä, 2011; Hulden, 2011) opextremely large, the improvement remains solely timizes the tape moves between derivation steps. as a theoretical possibility but the discovered regThe parallel testing of all context conditions inularity may still give valuable insight. volves (i) the initialization step and (ii) a number 5.2 The O(n) Time Bound of maintenance steps. The initialization step computes the validity of all context conditions at evHennie’s finite-stateness condition (Hennie, 1965) ery tape squares in amortised O(n) time. After for deterministic one-tape TMs and its generalthis, the total amortised time needed to maintain ization to nondeterministic one-tape TMs (Tadaki the contexts is then bounded by the total number et al., 2010) are insightful and provide a method of moves needed to perform the subsequent rule to construct the equivalent finite-state transducer applications. when the finite-stateness condition is met.

LB q0

W O R D I N G RB · · · q1 q2 q3 q4 q7 q6 q5 ←,→ q8 q9 q10 q11 . . .

Figure 4: A crossing sequence between squares A Hennie machine refers to a one-tape TM whose running time is O(n). Hennie analysed the expressive power of such machines using the concept of crossing sequence, aka schema (Rabin, 1963; Trakhtenbrot, 1964). This concept is a powerful tool in the analysis of the behaviour of twoway automata and one-tape TMs. A crossing sequence is the sequence of target states s1 , s2 , ... visited by a TM when its pointer crosses the boundary between a pair of adjacent tape squares. States s1 , s3 , ... are reached when the pointer moves forward and states s2 , s4 , ... are reached when pointer moves backwards. Figure 4 shows how states are visited during a computation. The crossing sequence between the 3rd and the 4th squares is (s1 , s2 , s3 ) = (q3 , q6 , q9 ). Every Hennie machine satisfies the property that the length of its crossing sequences is bounded by an integer k ∈ N. The finiteness of the crossing sequences of a given TM is undecidable (Pr˚usˇa, 2014) but if a finite upper bound k exists, this constant is computable (Kobayashi, 1985; Tadaki et al., 2010). Finiteness of crossing sequences implies that the TM is equivalent to a finite-state automaton/transducer. Furthermore, the bound lets us construct this finite-state device. Unfortunately, the size complexity of the constructed machine is large in comparison to the original TM: Proposition 16 (Pr˚usˇa 2014). Each |Q|-state, |Ω|symbol deterministic Hennie machine can be simulated by a nondeterministic finite automaton with 2O(|Ω| log |Q|) states. Testing the finite-stateness of already constructed TMs requires more effort than to design and construct machines that are immediately known to be Hennie machines. We will now mention a few immediate constructions. Pr˚usˇa (2014)’s construction is based on a finite weight w ∈ N of the tape squares. Every time when a square is visited or passed, the weight associated with the square is reduced. Once the

weight is zero, further visits to the square are blocked. Proposition 17 (Pr˚usˇa 2014). A weight-reducing one-tape TM is a Hennie machine. Analogously, we can define an NM-SCG whose cohorts has a weight w that is reduced whenever the pointer of the associated TM implementation visits the corresponding square. The cohorts of such an NM-SCG have obviously a finite volume v ≤ w and can be changed at most w times. Proposition 18. A finite-fertility NM-SCG implemented by a weight-reducing one-tape TM is equivalent to a finite-state transducer. The second way to construct a Hennie-machine based NM-SCG is to set the maximum distance m ∈ N ∪ {∞} between adjacent rule applications.1 When combined with the linear bound for rule applications, we obtain the O(n) bound and finitestate equivalence: Proposition 19. A finite-fertility NM-SCG runs in O(m( f + 1)vn) time and is equivalent to a finitestate transducer if m, f , v ∈ N. The third way is to assume fertility f ∈ N and w = 1. Since no square can be revisited, this forces the SCG to move constantly into one direction after all rule applications. This special case resembles the rewriting rules in finite-state phonology whose fundamental theorem (Johnson, 1972; Kaplan and Kay, 1994) states that if a phonological rule does not reapply to its own output (but instead moves on), it is regular. The fourth way to construct a Hennie machine from an SCG is based on the number of times the context conditions for a cohort has to be updated. A monotonic SCG reduces the ambiguity of the sentence at every rule application. The reduced ambiguity causes occasional updates in context conditions of cohorts. Depending on the context conditions, such updates at a cohort boundary may have an infinite or finite bound. Due to functionality and inward determinism of the ⇒-transducer, the pointer moves from one cohort to another only if the context conditions of the latter have changed as a result of a rule application. Thus, the number of context updates bound the number of moves: Proposition 20. If the context conditions can be updated only finitely often at every cohort, then the SCG is equivalent to a finite-state transducer. 1 This

approach was pursued and developed further by the current author in an earlier manuscript (Yli-Jyrä, unpublished) that is available on request.

6 6.1

Open Problems Aperiodic Context Conditions

Yli-Jyrä (2003) showed that the context conditions used in a realistic Finite-State Intersection Grammar (FSIG) are not only regular but star-free. Since context conditions of SCG rules are strictly weaker than those of FSIG (Tapanainen, 1999), we have a strong conjecture that contexts in practical SCG are also star-free. Star-free languages are definable in the monadic first-order logic of order, FO[

The Power of Constraint Grammars Revisited

The Power of Constraint Grammars Revisited

Suggest Documents

Constraint Dependency Grammars: SuperARVs ... - Semantic Scholar

An intuitive tool for constraint based grammars

Petri Net Controlled Grammars: the Power of

The Weighted Spanning Tree Constraint Revisited - Andrew.cmu.edu

The Experiencer Constraint Revisited - Linguistics at Cambridge

The Transformative Power of Europe Revisited

An Algebra for Semantic Construction in Constraint-based Grammars

Lexical Rules in Constraint-based Grammars - Association for ...

Lexical Rules in Constraint-based Grammars - Association for

49 On the Generative Power of Transformational Grammars ...

On the expressive power of Abstract Categorial Grammars - CiteSeerX

Global Index Grammars and Descriptive Power

Foucault and Power Revisited

On the Expressive Power of the Object Constraint

GRAMMARS OF DESIGNS AND GRAMMARS FOR ... - Cumincad

On the Expressive Power of Concurrent Constraint ... - Tidsskrift.dk

Upstream capacity constraint and the preservation of monopoly power ...

Upstream capacity constraint and the preservation of monopoly power ...

Generative Power and Closure Properties of Watson-Crick Grammars

MISO Capacity with Per-Antenna Power Constraint

Tree-Adjoining Grammars as Abstract Categorial Grammars

'Lexicalized' Grammars: Application to Tree Adjoining Grammars

Lambek Grammars, Tree Adjoining Grammars and ... - LaBRI

power Lines and property values revisited