The Hardness Results of Actively Predicting Simple Subclasses of Context-Free Grammars Kouichi Hirata
Hiroshi Sakamoto
Department of Artificial Intelligence
Department of Informatics
Kyushu Institute of Technology
Kyushu University
Kawazu 680-4, Iizuka 820-8502, Japan
Hakozaki 6-10-1, Fukuoka 812-8581, Japan
[email protected]
[email protected]
Hiroki Arimura Department of Informatics Kyushu University Hakozaki 6-10-1, Fukuoka 812-8581, Japan
[email protected]
Abstract In this paper, we present the hardness results of actively predicting context-free grammars that the number of nonterminals is just one, that is sequential, that is properly sequential, and that the number of nonterminals appearing in the righthand side of each production is bounded by some constant. keywords: computational learning theory, prediction, grammatical inference, formal language
1
Introduction
The task of predicting the classification of a new example is frequently discussed from the viewpoints of both passive and active settings. In a passive setting, the examples are all chosen independently according to a fixed but unknown probability distribution, and the learner has no control over selection of examples [5, 6]. In an active setting, on the other hand, the learner is allowed to ask about particular examples, that is, the learner makes membership queries, before the new example to predict is given to the learner [1, 3]. Concerned with language learning, we can design a polynomial-time algorithm to predict deterministic finite automata (DFAs) in an active setting [1], while predicting DFAs is as hard as computing certain apparently hard cryptographic predicates in a passive setting [5]. Furthermore, predicting unrestricted context-free grammars (CFGs) is also hard under the same cryptographic assumptions in an active setting [3]. Here, the cryptographic assumptions denote the intractability of inverting RSA encryption, recognizing quadratic residues and factoring Blum integers. 1
On the other hand, by using nonterminal membership queries, we can design a polynomial-time algorithm to predict k-bounded CFGs each of which right-hand side of productions contains at most k nonterminals [2]. Then, which of subclasses of CFGs is polynomial-time predictable if we allow to use ordinal or nonterminal membership queries? Furthermore, are k-bounded CFGs polynomial-time predictable with ordinal membership queries? In this paper, first we introduce the following simple subclasses of CFGs except k-bounded CFGs: • the 1-CFGs that contain just one nonterminal, • the sequential CFGs [4, 8] that the set of nonoterminals has a partial order ≤ such that T → vUw iff T ≤ U for nonterminals T and U, and • the properly sequential CFGs that is sequential but disallowed the occurrence of the same nonterminal in left- and right-hand sides in each production. Then, we show the results summarized in Table 1. This paper gives the partially extended results of [7] from the viewpoint of CFGs, instead of elementary formal systems, and implicitly improves the results of [9].
2
Preliminaries
Let Σ and N be two non-empty finite sets of symbols such that Σ ∩ N = ∅. A production A → α on Σ and N is an association from a nonterminal A ∈ N to a string α ∈ (N ∪ Σ)∗ . A context-free grammar (CFG, for short) is a 4-tuple (N, Σ, P, S), where S ∈ N is the distinguished start symbol and P is a finite set of productions on Σ and N. Symbols in N are said to be nonterminals, while symbols in Σ terminals. In this paper, we deal with the following subclasses of CFGs. • A CFG G = (N, Σ, P, S) is called an 1-CFG if N = {S}. • A CFG G = (N, Σ, P, S) is called sequential [4, 8] if the nonterminals in N are labeled S = T1 , . . . , Tn such that, for each production Ti → w, w ∈ (Σ ∪ {Tj | i ≤ j ≤ n})∗ . Table 1: The predictability of simple subclasses of CFGs with membership queries (MQ) and nonterminal membership queries (NMQ). Here, “crypt.” means not to be polynomialtime predictable under the cryptographic assumptions and “DNF” not to be polynomialtime predictable if DNF formulas are not polynomial-time predictable with membership queries. subclasses of CFGs 1-CFGs sequential CFGs properly sequential CFGs k-bounded CFGs (k ≥ 1)
with MQ DNF DNF DNF crypt.
2
with NMQ DNF DNF DNF predictable [2]
• A sequential CFG satisfying that, for each production Ti → w, w ∈ (Σ ∪ {Tj | i < j ≤ n})∗ is called properly sequential . • A CFG G = (N, Σ, P, S) is called k-bounded [2] if the right-hand side of each production in P has at most k nonterminals. Let G be a CFG (N, Σ, S, P ) and α and β be strings in (Σ ∪ N)∗ . We denote α ⇒G β if there exist α1 , α2 ∈ (Σ ∪ N)∗ such that α = α1 Xα2 , β = α1 γα2 and X → γ ∈ P . We extend the relation ⇒G to the reflexive and transitive closure ⇒∗G . Let G = (N, Σ, P, S) be a CFG. For a nonterminal A ∈ N, the language LG (A) of A is the set {w ∈ Σ∗ | A ⇒∗G w}. The language L(G) of G just refers to LG (S). A language L is called a context-free language (CFL, for short) if there exists a CFG G such that L = L(G). If the CFG is an 1-, sequential, properly sequential and k-bounded CFG, then the CFL is called an 1-, sequential, properly sequential and k-bounded CFL, respectively. Let U denote Σ∗ . If w is a string, |w| denotes its length. For each n > 0, U [n] = {w ∈ U | |w| ≤ n}. A representation of concepts L is any subset of U × U. We interpret an element hc, wi of U × U as consisting a concept representation c and an example w. The example w is a member of a concept c if hc, wi ∈ L. To represent CFGs, we define the class LCFG as the set of pairs hc, wi such that c encodes a CFG G and w ∈ L(G). Also we define the classes L1-CFG , LsqCFG , LpsqCFG and Lk-bounded-CFG corresponding to an 1-, sequential, properly sequential and k-bounded CFG, respectively, as similar. The class L∪DFA of finite union of DFAs denotes the set of pairs hc, wi such that c encodes a finite set M1 , . . . , Mr of DFAs and w is in the concept represented by c iff at least one Mi accepts w. Angluin and Kharitonov [3] have shown that L∪DFA is not polynomialtime predictable with membership queries under the cryptographic assumptions. The class LDNF denotes the set of pairs hc, wi such that c encodes a positive integer n and a DNF formula d over n Boolean variables x1 , . . . , xn such that |w| = n (w = w1 . . . wn ) and the assignment xi = wi (1 ≤ i ≤ n) satisfies d. Angluin and Kharitonov [3] have shown that LDNF is either polynomial-time predictable without membership queries or not polynomial-time predictable with membership queries, if there exist one-way functions that cannot be inverted by polynomial-sized circuits. However, it is still open which of the statements holds. In order to obtain the results of this paper, it is sufficient to introduce the following concept of prediction-preserving reducibility [3, 6], by incorporating the above results for L∪DFA and LDNF with Theorem 2 mentioned below. Hence, we omit the formal definitions of the prediction algorithm and the predictability. See the papers [3, 5, 6] for more detail. Since we adopt an active setting, we may allow to use membership queries or nonterminal membership queries. The nonterminal membership query is proper for CFGs. 1. A(n ordinal) membership query [1, 3] takes a string w ∈ U as input and returns “yes” if w ∈ c; and “no” otherwise. 2. A nonterminal membership query [2] takes a string w ∈ U and a nonterminal T ∈ c as input and returns “yes” if w ∈ cT ; and “no” otherwise. Here, cT denotes the language LG (T ) for a CFG G and a nonterminal T encoded by c.
3
Definition 1 (Angluin & Kharitonov [3]) Let Li be a representation of a concept over domain Ui (i = 1, 2). We say that predicting L1 reduces to predicting L2 with membership queries (pwm-reduces, for short), denoted by L1 pwm L2 , if there exist a function f : N×N×U1 → U2 (called an instance mapping ), a function g : N×N×L1 → L2 (called a concept mapping ) and a function h : N × N × U2 → U1 ∪ {⊤, ⊥} (called a membership query mapping ) satisfying the following conditions: [n]
[s]
1. for each x ∈ U1 and c ∈ L1 , x ∈ c iff f (n, s, x) ∈ g(n, s, c); 2. the size complexity of g is polynomial in the size complexity of c; 3. f (n, s, x) can be computed in polynomial time. [s]
4. for each x′ ∈ U2 and c ∈ L1 , if h(n, s, x′ ) = ⊤ then x′ ∈ g(n, s, c); if h(n, s, x′ ) = ⊥ then x 6∈ g(c); if h(n, s, x′ ) = x ∈ U1 , then it holds that x′ ∈ g(n, s, c) iff x ∈ c; 5. h(n, s, x′ ) can be computed in polynomial time. Theorem 2 (Angluin & Kharitonov [3]) Let L1 and L2 be representations of concepts, and suppose that L1 pwm L2 . If L1 is not polynomial-time predictable with membership queries, then neither is L2 .
3
Hardness Results for Actively Predicting CFGs
3.1
1-CFGs
The 1-CFLs and the regular languages are incomparable as follows: • {ww R | w ∈ Σ∗ } is an 1-CFL, but not regular. • {al bm cn | a, b, c ∈ Σ, l, m, n ≥ 1} is regular, but not an 1-CFL. Proposition 3 The 1-CFLs are closed under reverse and Kleene star, but not closed under union, intersection, complement, and concatenation. Proof. Let G = (N, Σ, S, P ) be an 1-CFG and suppose that L = L(G). Then, we can construct the 1-CFGs G1 = (N, Σ, S, {S → αR | S → α ∈ P }) and G2 = (N, Σ, S, {S → SS} ∪ P ) such that L(G1 ) = LR and L(G2 ) = L∗ . Next, we show that 1-CFLs are not closed under the remained operations. In the following, assume that a, b, c ∈ Σ. Both L1 = {an bn | n ≥ 1} and L2 = {bn an | n ≥ 1} are 1-CFLs, but L1 ∪ L2 is not. For 1-CFGs G1 = ({S}, Σ, S, {S → aS | bSc | bc}) and G2 = ({S}, Σ, S, {S → aSb | aSc | ab}), it holds that L(G1 ) ∩ L(G2 ) = {an bk ck | n ≥ k ≥ 0}, which is not an 1-CFL. Furthermore, we can show that there exists no 1-CFG G such that L(G) = {an bn | n ≥ 1}. Finally, both L1 = {ai bj | i, j ≥ 0} and L2 = {bk cℓ | k, ℓ ≥ 0} are 1-CFLs, but L1 · L2 = {ai bj ck | i, j, k ≥ 0} is not. 2 Theorem 4 L1-CFG is not polynomial-time predictable with ordinal or nonterminal membership queries, if LDNF is not polynomial-time predictable with membership queries.
4
Proof. It is sufficient to show that LDNF pwm L1-CFG . Let d = t1 ∨ · · · ∨ tm be a DNF formula over n Boolean variables x1 , . . . , xn . Then, let an instance mapping f be an identity function, that is, f (n, m, e) = e for e ∈ {0, 1}n . Also let a concept mapping g(n, m, d) be as follows: g(n, m, d) = ({S}, {0, 1}, S, {S → 0 | 1 | v11 . . . vn1 | . . . | v1m . . . vnm }). Here, vij (1 ≤ i ≤ n, 1 ≤ j ≤ m) is defined as follows: vij
1 = 0 S
if tj contains xi , if tj contains xi , otherwise.
Note that g(n, m, d) is an 1-CFG. Let a membership query mapping h be as follows: ′
h(n, m, e ) =
(
e′ ⊥
if |e′ | = n, otherwise.
Then, the following two statements hold: 1. e satisfies d iff S ⇒∗g(n,m,d) f (n, m, e) for each e ∈ {0, 1}n , and 2. S ⇒∗g(n,m,d) e′ iff h(n, m, e′ ) satisfies d for each e′ ∈ {0, 1}∗ such that |e′ | = n. Hence, it holds that LDNF pwm L1-CFG . Finally, we remark that nonterminal membership queries for L1-CFG coincide with ordinal ones. 2
3.2
Sequential and Properly Sequential CFGs
The sequential CFLs properly contain the regular languages and are properly contained by the CFLs [4, 8]. In particular, the properness holds as follows [4]: • {an bn | a, b ∈ Σ, n ≥ 1} is a sequential CFL, but not regular. • {wcw R | w ∈ (a∗ ba∗ ca∗ )∗ , a, b, c ∈ Σ} is a CFL, but not sequential. Since the 1-CFG G in the proof of Theorem 4 is sequential, the following corollary holds. Corollary 5 LsqCFG is not polynomial-time predictable with ordinal or nonterminal membership queries, if LDNF is not polynomial-time predictable with membership queries. While every properly sequential CFL is finite and also regular, the similar hardness result of LsqCFG holds for LpsqCFG as follows. Theorem 6 LpsqCFG is not polynomial-time predictable with ordinal or nonterminal membership queries, if LDNF is not polynomial-time predictable with membership queries. Proof. As similar as the proof of Theorem 4 or our previous works [7, 9], we can show that LDNF pwm LpsqCFG : For a DNF formula d = t1 ∨ · · · ∨ tm over n Boolean variables x1 , . . . , xn , let a concept mapping g(n, m, d) be as follows: g(n, m, d) = ({S, T }, {0, 1}, S, {S → w11 . . . wn1 | . . . | w1m . . . wnm, T → 0 | 1}). 5
Here, wij (1 ≤ i ≤ n, 1 ≤ j ≤ m) is defined as follows: wij
1 = 0 T
if tj contains xi , if tj contains xi , otherwise.
Note that g(n, m, d) is properly sequential. Furthermore, let an instance mapping and a membership query mapping be an identity function. In this pwm-reduction, whether f (n, m, e) ∈ L(g(n, m, d)) (e ∈ {0, 1}n ) is independently determined from the replies of a nonterminal membership query for T . 2
3.3
k-Bounded CFGs
The 1- and 2-bounded CFLs coincide with the languages generated by linear grammars and Chomsky normal form grammars, respectively [2]. Angluin [2] has shown that Lk-bounded-CFG (k ≥ 1) is polynomial-time predictable with nonterminal membership queries. The following theorem claims that we cannot replace nonterminal membership queries with ordinal ones, preserving its predictability. Theorem 7 For each k ≥ 1, Lk-bounded-CFG is not polynomial-time predictable with membership queries under the cryptographic assumptions. Proof. It is sufficient to show that L∪DFA pwm L1-bounded-CFG . Let M1 , . . . , Mr be DFAs with the same alphabet Σ. For each Mi = (Qi , Σ, δi , q0i , Fi ), construct 1-bounded CFGs Gi (n, s, Mi ) = (Qi , Σ, Pi , q0i ) such that δi (q, a) = r iff q → ar ∈ Pi for each q, r ∈ Qi and a ∈ Σ. For S 6∈ (∪1≤i≤r Qi ) ∪ Σ, let a concept mapping g(n, s, M1 ∪ . . . ∪ Mr ) be as follows: g(n, s, M1 ∪ . . . ∪ Mr ) = ({S} ∪ (∪1≤i≤r Qi ), Σ, S, {S → q01 | . . . | q0r } ∪ (∪1≤i≤r Pi )). Note that the size of g(n, s, M1 ∪ . . . ∪ Mr ) is bounded by polynomial in the total size of all Mi ’s. Furthermore, let an instance mapping and a membership query mapping be an identity function. Then, it holds that L(M1 ) ∪ . . . ∪ L(Mr ) = L(g(n, s, M1 ∪ . . . ∪ Mr )), which implies that L∪DFA pwm L1-bounded-CFG . 2
References [1] D. Angluin: Learning regular sets from queries and counterexamples, Information and Computation 75, 87–106, 1987. [2] D. Angluin: Learning k-bounded context-free grammars, YALEU/DCS/RR-557, Yale University, 1987.
Technical Report
[3] D. Angluin and M. Kharitonov: When won’t membership queries help? , Journal of Computer and System Science 50, 336–355, 1995. [4] A. Ginsburg: The mathematical theory of context free languages, McGraw-Hill, 1966. [5] M. Kearns and L. Valiant: Cryptographic limitations on learning Boolean formulae and finite automata, Journal of the ACM 41, 67–95, 1994. 6
[6] L. Pitt and M. K. Warmuth: Prediction-preserving reduction, Journal of Computer and System Science 41, 430–467, 1990. [7] H. Sakamoto, K. Hirara and H. Arimura: Learning elementary formal systems with queries, Technical Report DOI-TR-179, Department of Informatics, Kyushu University, 2000. Also available at http://www.i.kyushu-u.ac.jp/doi-tr.html. [8] E. Shamir: On sequential languages and two classes of regular events, Zeitschrift f¨ ur Phonetik, Sprachwissenschaft und Kommunikationsforschung 18, 61–69, 1965. [9] N. Sugimoto, T. Toyoshima, S. Shimozono and K. Hirata: Constructive learning of context-free languages with a subpansive tree, Proc. 5th International Colloquium on Grammatical Inference, Lecture Notes in Artificial Intelligence 1891, 270–283, 2000.
7