On the Circuit Complexity of Random Generation Problems for Regular and Context-Free Languages? Massimiliano Goldwurm1, Beatrice Palano2 , and Massimo Santini1 1
Dipartimento di Scienze dell’Informazione, Università degli Studi di Milano Via Comelico, 39/41 20135 Milano – Italia {goldwurm,santini}@dsi.unimi.it 2
Dipartimento di Informatica, Università degli Studi di Torino C.so Svizzera, 185 10149 Torino – Italia
[email protected]
Abstract We study the circuit complexity of generating at random a word of length n from a given language under uniform distribution. We prove that, for every language accepted in polynomial time by 1-NAuxPDA of polynomially bounded ambiguity, the problem is solvable by a logspace-uniform family of probabilistic boolean circuits of polynomial size and O(log 2 n) depth. Using a suitable notion of reducibility (similar to the NC1 -reducibility), we also show the relationship between random generation problems for regular and context-free languages and classical computational complexity classes such as DIV, L and DET. Keywords: uniform random generation, ambiguous context-free languages, auxiliary pushdown automata, circuit complexity.
1 Introduction Given a formal language L, the uniform random generation problem for L consists of computing, for an instance n > 0, a word of length n in L uniformly at random. We study the circuit complexity of this problem for several classes of languages including regular, context-free (c.f. for short) and more generally languages accepted by one-way nondeterministic auxiliary push-down automata (1-NAuxPDA). Several sequential algorithms have been proposed for the random generation of strings in regular and context-free languages [12, 10, 9, 11]. The problem is particularly interesting in the c.f. case because these languages can codify a wide variety of combinatorial structures; moreover, sampling words from c.f. languages is naturally motivated by other applications such as testing parsers of programming languages [12] or evaluating the performance of algorithms which process DNA sequences [20, 19]. In the case of unambiguous c.f. languages the best known algorithm for random generation works in O(n log n) arithmetic time [10]; this is a special case of more general procedures for the random generation of so called “labelled combinatorial structures”. ? This work has been partially supported by MURST Research Program “Unconventional com-
putational models: syntactic and combinatorial methods”.
In the case of general (possibly ambiguous) c.f. languages a subexponential time algorithm is described in [11] for the (almost uniform) random generation of strings of given length. The problem is solvable in polynomial time if the language is generated by a c.f. grammar of polynomially bounded ambiguity [4]. This result also holds for languages accepted by polynomial time 1-NAuxPDA of polynomially bounded ambiguity and, under suitable hypotheses, a similar approach can be applied to the combinatorial structures that admit an ambiguous specification (in the sense that the same object may have several distinct descriptions). In this work we give a classication of the circuit complexity of these problems which includes languages described by possibly ambiguous specifications. Our most general result states that for every language accepted by a polynomial time 1-NAuxPDA of polynomially bounded ambiguity the uniform random generation problem can be solved by a log-space uniform family of probabilistic boolean circuits of polynomial size and O(log2 n) depth. This, in particular, emphasizes the difference between counting and random generation: indeed, for some finitely ambiguous c.f. languages the counting problem is #P1 complete [3]. Stronger results can be obtained for less general and well-known classes of languages such as regular and context-free languages. To compare the complexity of our problem for such classes, we give a natural extension of the usual NC1 reducibility [7]. We say that the uniform random generation problem for a language L is RNC1g reducible to a class of boolean functions if it can be solved by a logspace-uniform family of probabilistic boolean circuits of polynomial size and O(log n) depth using oracle nodes in . Using this notion we show the relationship between our problem and classical computational complexity classes such as DIV, DET and #SAC1 [7, 21] (here defined in Section 2). We show that, for every regular language the problem of uniform random generation is RNC1g -reducible to the class DIV; moreover, in case of unambiguous c.f. languages the problem is RNC1g -reducible to DIV [ L and, for polynomially ambiguous c.f. languages it is RNC1g -reducible to #SAC1 . Finally, we consider a general version of the uniform random generation problem for regular languages, where the deterministic finite automaton describing the language is part of the input; in this case, the problem is RNC1g -reducible to DET. These results are obtained by combining the complexity of counting and recognition problem with the study of some reachability problems on certain random graphs arising from the design of the circuits.
C
C
2 Probabilistic Circuits for Random Generation We assume some familiarity with (bounded fan-in) boolean circuits as defined in [7, 22]. We say that a family f n gn>0 of boolean circuits is uniform if there exists a logspace bounded Turing machine which on input 1n computes a description of n . The class NCk is the set of boolean functions computable by uniform families of boolean circuits of polynomial size and O(logk n) depth, where n is the input size. A boolean function f is NC1 -reducible to a boolean function g , if f can be computed by a uniform family of boolean circuits of polynomial size and O(log n) depth equipped with oracle nodes for computing g ; here, the depth of any oracle node with fan-in i and fan-out o
counts for dlog(i + o)e. Given a class of boolean functions, we denote by NC1 ( ) the closure of under NC1 reducibility. Let intdet and intdiv be the problems of computing respectively the determinant of n n matrix of n-bit integers and the division of two n-bit integers. As usual, we denote by L (NL) the class of languages recognized in O(log n) space by a deterministic (nondeterministic) Turing machine. Hence, the classes L , NL , DET and DIV are defined respectively by L = NC1 (L), NL = NC1 (NL), DET = NC1 (fintdetg) and DIV = NC1 (fintdivg). The following relations are known [7]:
C
C
NC1
C
NL DET NC2 : L DIV
Finally, by #SAC1 we denote the set of functions computing the number of accepting subtrees in a uniform family of semi-unbounded circuits of polynomial size and O(log n) depth [21]; we also recall that #SAC1 NC2 . In this work we use boolean circuits to solve uniform random generation problems. To this end we use the notion of probabilistic boolean circuit as introduced in [7]. This is a boolean circuit equipped in addition with independent and identically distributed random input bits: each of them assumes a value in f0; 1g with probability 1=2. Example 1. Consider the problem of generating at random an integer according to some specified distribution. Let a1 ; a2 ; : : : ; an be n-bit positive integers, we design a probabilistic boolean circuit n which, on input a1 ; a2 ; : : : ; an , outputs a k 2 f1; 2; : : : ; ng[ f?g such that: 1. Prfk = ?g 1=4, 2. for every 1 i n, Prfk
i j k 6 ?g ai =a, where a Pni=1 ai . P a , for i n; then First of all, the circuit computes in parallel all si ji j it computes ` fi sn < i g. Let now r1 ; r2 2 f ; ; : : : ; `g be two random integers defined by two distinct sets of ` random input bits each. The circuit computes in parallel kj fi rj si g for j ; (where we assume ? ?). Finally it outputs k1 if this is different from ?, else it outputs k2 . Clearly, the probability of giving ? as output is less than or equal to = while, =
=
=
=
=
= min
= min
:
:
2
1
1 2
= 1 2
2
min
=
1 4
if this is not the case, the output has the required distribution. Recalling the circuit complexity of elementary arithmetic operations [22], one can conclude that the size of the circuit is polynomial and its depth is O(log n). Notice that, by taking m = nO(1) parallel copies of the same circuit, one can solve the problem, still in polynomial size and O(log n) depth, reducing the probability of answering ? to 1=4m at most. ut We now introduce a parallel hierarchy to classify the uniform random generation problem for formal languages.
Definition 1. A uniform family of probabilistic boolean circuits f n gn>0 is a uniform random generator (u.r.g.) for a formal language L , if each n , on input 1n , computes a value !n in n [ f?g such that, if L \ n 6= ?, then:
1. 2.
f!n ?g = , f!n x j !n 6 ?g
Pr
=
1 4
Pr
=
=
= L \ n , for every x 2 L \ n.
= 1 #(
)
Moreover, we say that the uniform random generation problem for L belongs to the class RNCkg if there exists a u.r.g. for L of polynomial size and O(logk n) depth.
Observe that this class is not the usual class RNCk [7], since here we are not interested in computing a boolean function bounding the probability of a wrong answer, but we rather want to produce a random output with a given distribution explicitly notifying the possible failure of the computation (due to the restriction to unbiased random bits). We say that the uniform random generation problem for a language L is RNC1g reducible to a class of boolean functions if there exists a u.r.g. for L of polynomial size and O(log n) depth which uses oracle nodes in (again, the depth of any oracle node with fan-in i and fan-out o counts for dlog(i + o)e); we denote by RNC1g ( ) the class of uniform random generation problems RNC 1g -reducible to .
C
C
C
C
3 Regular Languages In this section we study the circuit complexity of the uniform random generation problem for regular languages. We show the problem to be RNC 1g -reducible to intdiv. Let = h; Q; q0 ; F; Æi be a deterministic finite automaton and define, for q 2 Q and 0 ` n, the language L`q = fx 2 ` : Æ (q; x) 2 Fg and set (q; `) = #L`q (where, as usual, 0 = fg). We start by defining a family of (random) graphs which allows to design the circuits for solving our problem. For every integer n > 0, define the (direct acyclic) labelled graph Gn ( ) = hVn ; En i such that Vn = f(q; `) : q 2 Q; 0 ` ng and En is built according to the following procedure: for every v = (q; `) 2 Vn with ` > 0 pick v 2 at random such that, for every 2 ,
A
A
fv g Æ q; q;;`` and add to En the edge q; ` ; Æ q; v ; ` with label v . Since Gn A is acyclic and all nodes q; ` with ` > have out-degree , for every q; ` 2 Vn and < m ` there exists just one node reachable from q; ` through a path of length m. Let ! q; ` be the word consisting of the labels along the 1 ` , where q1 q, qi+1 Æ qi ; i path leaving q; ` of length `: i.e. ! q; ` and qi ; ` i ; qi+1 ; ` i 2 En , for i < `. Reasoning by induction on ` n, one can prove that f! q; ` xg = q; ` , for every L`q 6 ? and every x 2 L`q . Hence, we obtain the following Lemma 1. For every n > such that L A \ n 6 ?, Pr
((
(
(
=
=
) ( (
)
)
1))
(
)
1)
(
)
)
)
0
1
0
(
(
(
((
( (
)
(
+ 1) (
) =
(
0
(
f! q0 ; n
Pr
A \ n. )
(
=
(
1
Pr
for every x 2 L(
=
))
1
)
)
) =
xg
) =
= 1
)
=
=
(
)
L A \ n ; 1
#(
(
)
)
=
)
We now show that, if the automaton is fixed, given 1n and Gn ( ) as input, computing the word ! (q0 ; n) belongs to NC1 . To this aim, we need some preliminary tools. We say that a nd nd boolean matrix A is (d; t)-upper–diagonal if A is a block matrix of the form A = (Ai;j ), where all Ai;j are d d matrices such that Ai;j 6= 0 (the zero matrix) iff j = i + t (d; t > 0, i; j = 1; : : : ; n). Observe that, for every pair of ndnd boolean matrices A; B , if A is (d; s)-upper– diagonal and B is (d; t)-upper–diagonal, then the product AB is (d; s + t)-upper– diagonal:
A
AB i;j
(
)
=
A
( Ai;i+s Bi+s;i+(s+t)
if j = i + (s + t), otherwise;
0
moreover, AB can be obtained by computing in parallel n d d matrices. For this reason, we can prove the following
s t
(
+
)
many products of
Lemma 2. Let d > 0 be a fixed integer. If A is a (d; s)-upper–diagonal boolean matrix of size nd nd, then computing the boolean power An on input A belongs to NC1 .
Proof. Observe that A2 is (d; 2s)-upper–diagonal and can be computed by a boolean i circuit of polynomial size and constant depth. So, for every i > 0, A2 is a (d; 2i s)upper–diagonal matrix and can be computed in polynomial size and O(i) depth. Then
An
=
Y 2i A ;
i:bi =1
where bi 2 f0; 1g, for 0 i blog n , are the digits of the binary expansion of n, i n i.e. n = i bi 2 . Hence A can be obtained by a product of a logarithmic number of upper-diagonal matrices. Such a product can be computed in polynomial size and O(log log n) depth. ut
P
Since all the edges of Gn ( ) are of the form ((q; `); (q 0 ; ` 1)) for some q; q 0 2 Q and 0 < ` n, its adjacency matrix of Gn ( ) is (#Q; 1)-upper–diagonal (where each block corresponds to a set of nodes with the same second component).
A
Lemma 3. For a fixed automaton belongs to NC1 .
A
A , given Gn A (
)
as input, the computation of ! (q0 ; n)
A
Proof. Let M be the adjacency matrix of Gn ( ). Recall that for every v = (q; `) 2 Vn and 0 < m ` there exists exactly one node that can be reached from v by a path of lenght m, hence the row corresponding to v in M m contains exactly one 1. Hence, for 0 i < n 1, all the nodes (qi ; n i) reachable from (q0 ; n) by a path of length i can be computed in parallel as in Lemma 2. ut
Now let us describe the probabilistic boolean circuit n which on input 1n computes a word in L( ) \ n under uniform distribution. First the circuit computes in parallel all the coefficients (q; `) for q 2 Q and 0 ` n. This computation belongs to DIV as proven in [2]. Then the circuit computes the graph Gn ( ) by generating in parallel
A
A
all random symbols v for v 2 Vn . As shown in Example 1, this step can be executed in O(log n) depth so that, for each v 2 Vn , Prfv = ?g 2 (2+dlog(n#Q)e) and hence, the probability that v = ? for some v 2 Vn is at most 1=4. Thus, if all labels of Gn ( ) are in the circuit outputs the string !(q0 ; n) computed in O(log n) depth as shown in Lemma 3; in this case, by Lemma 1, the distribution of the output is uniform. Otherwise, if v = ? for some v 2 Vn , the circuit outputs ?. This proves the following
A
Theorem 1. For every regular language, the uniform random generation problem belongs to RNC1g (DIV).
4 Context Free Languages In this section we study the uniform random generation problem for context-free languages. We first show that for unambiguous c.f. languages the problem is RNC1g -reducible to L [ DIV. Then we prove that, for all inherently ambiguous c.f. languages having polynomial ambiguity degree, the problem is RNC1g -reducible to #SAC1 and hence belongs to RNC2g . 4.1 Unambiguous Context-Free Languages
G
Let = hN; ; S; Pi be an unambiguous c.f. grammar in Chomsky normal form without useless variables, where N is the set of variables, the set of terminals, S the initial variable and P the set of productions. For every A 2 N and every 1 ` n, define (A; `) as the number of derivation trees of rooted at A and deriving a word in ` . Moreover, let L`A = fx 2 ` : A ) xg; since is unambiguous, (A; `) = #L`A . As in the regular language case, we start by defining a family of (random) graphs which allows to design the circuits for solving our problem. For every integer n > 0, define the (direct acyclic) graph Gn ( ) = hVn ; En i such that Vn = f(A; r; s) : A 2 N; 1 r s ng [ f(; r) : 2 ; 1 r ng and En is built according to the following procedure:
G G
G
– for each v
P
A; r; r 2 Vn , pick pv 2 P at random such that, for every A! 2
= (
)
(
fpv
A! g A; and add to En the edge A; r; r ; ; r ; for each v A; r; s 2 Vn with s > r, pick pv 2 P f ; : : : ; s rg at random such that, for every A!BC 2 P and k s r, k fpv A!BC; k g B; k A;C;s s r r and add to En the edges A; r; s ; B; r; r k and A; r; s ; C; r k; s . Pr
((
–
)
= (
) (
)
1
=
(
1)
))
)
1
(
Pr
= (
)
= (
1
)
((
) (
=
(
) ( (
+
+1
)
+ 1)
1))
((
) (
+
))
G
Clearly Gn ( ) is acyclic, all its nodes (A; r; s) 2 Vn with s > r have out-degree 2, and the subgraph of Gn ( ) induced by the set of nodes reachable from any (A; r; s) is a binary tree with s r + 1 leaves of the form (; r) 2 Vn . Let ! (A; r; s) = r s , where the nodes (i ; i), for r i s, are the leaves of the subtree of Gn ( ) rooted at ` (A; r; s). Reasoning by induction on 1 ` n, one can prove that for every LA 6= ? and every x 2 L`A , if 1 r s n and s r + 1 = `, then Prf! (A; r; s) = xg = 1= (A; `). As a consequence, we obtain the following
G
G
G \ n 6 ?,
Lemma 4. For every n > 0 such that L(
f! S; ; n
Pr
(
1
) =
)
xg
=
=
L G \ n ; 1
#(
(
)
)
G \ n.
for every x 2 L(
)
We now consider the problem of computing ! (S; 1; n).
G
Lemma 5. Let = hN; ; S; Pi be a fixed unambiguous c.f. grammar in Chomsky normal form without useless variables. Given Gn ( ) as input, the computation of ! (S; 1; n) belongs to L .
G
Proof. First observe that every (A; r; s) 2 Vn with r < s has only two out-neighbours (B; r; r + k 1) and (C; r + k; s), for some 1 k s r and some B; C 2 N ; hence, for every r i s, a node (; i) is reachable from (A; r; s) iff it is reachable either from (B; r; r + k 1) in the case i < r + k , or from (C; r + k; s) otherwise. Thus a log-space bounded deterministic Turing machine can be designed which tests whether a node (; i) 2 Vn is reachable from (S; 1; n). Then the word ! (S; 1; n) can be computed by testing in parallel the reachability of (; i) from (S; 1; n) for all 1 i n and all 2 . ut Now, reasoning as in Section 3, a probabilistic boolean circuit can be designed which, on input 1n , first computes in parallel all the coefficients (A; r; s), then determines the graph Gn ( ) and finally it generates the string ! (S; 1; n). The first step can be done in DIV [2] while the last one is in L as shown in Lemma 5. This, together with Lemma 4, yields the following
G
Theorem 2. For every unambiguous context-free language, the uniform random generation problem belongs to RNC1g (DIV [ L). 4.2 Polynomially Ambiguous Context-Free Languages In this section we study the uniform random generation problem for inherently ambiguous context-free languages. Let = hN; ; S; Pi be a c.f. grammar in Chomsky normal form without useless variables; for every x 2 , we denote by ambG (x) the ambiguity of x, i.e., the number of derivation trees of x in . We call ambiguity degree of the function dG : N ! N defined by dG (n) = maxfambG (x) : x 2 n g, for every n 2 N. Then, is said polynomially ambiguous if, for some polynomial p(n), we have dG (n) p(n) for every n > 0.
G
G
G
G
G
One can easily prove that if is an ambiguous c.f. grammar the circuit designed for Theorem 2, on input 1n , gives output !n such that Prf!n = ?g 1=4 and, for every
x 2 n
f!n x j !n 6 ?g P
Pr
=
=
=
G (x) ; G (y ) y2n amb
(1)
amb
the main change, in this case, is that (A; `) and #L`A may be different. In order to obtain the uniform distribution we use a “rejection method” [15], giving a parallel version of a procedure described in [4]. Assume now that is polynomially ambiguous and let p(n) be a polynomial such that dG (n) p(n) for every n > 0. A probabilistic boolean circuit can be designed which on input 1n first computes m = p(n)! and then executes 4 p(n) times in parallel (and independently of one another) the following computation:
G
– y = ?; – generate !n at random in L( ) \ n according to the distribution given by (1); – if !n 6= ?, then compute a = ambG (!n ); generate r uniformly at random in f1; : : : ; 2dlog me g; if a r m then y = !n ; – return y .
G
Then the circuit outputs ? if all the 4 p(n) computations return ?, otherwise it outputs the first y 6= ?. Reasoning as in [4], it can be proven that the probability of getting ? is at most 1=4, otherwise, the output is distributed uniformly at random in L( ) \ n . Evaluating the complexity of the circuit, we observe that the computation of ambG (x) for all x 2 belongs to #SAC1 [21]. Hence, since both L and DIV are included in 1 #SAC , we obtain the following
G
Theorem 3. For every language generated by a polynomially ambiguous context-free grammar, the uniform random generation problem belongs to RNC1g (#SAC1 ).
5 One-way Nondeterministic Auxiliary Pushdown Automata In this section we describe a family of probabilistic boolean circuits to solve our problem in the case of languages accepted by one-way nondeterministic auxiliary pushdown automata (1-NAuxPDA, for short). These circuits are based on the computation of the ambiguity of terminal strings with respect to different c.f. grammars. For this reason we first study the problem of computing the value ambG (x) having in input a c.f. grammar in Chomsky normal form and a word x 2 .
G
5.1 The General Ambiguity Problem We start by recalling a result given in [18] to evaluate arithmetic circuits of size n and degree d in O(log n log(nd)) parallel time (see also [16]). Here, by arithmetic circuit over a semiring R we mean a labelled directed acyclic graph with three kinds
of vertices: input nodes of fan-in 0 with labels in R, addition nodes of fan-in greater than 1 labelled by +, and multiplication nodes of fan-in 2 labeled by ; we also assume that there is no edge between two multiplication nodes. The degree of the circuit is the maximum degree of its nodes, defined by induction as follows: every input node has degree 1, the degree of every multiplication node is the sum of the degrees of its two inputs and the degree of every addition node is the maximum of the degrees of its inputs. The value of a node can be defined in the standard way: all input nodes take as value their labels, the value of an addition (multiplication) node is the sum (product) of the values of its inputs. Proposition 1 ([18]). The values of all nodes in any arithmetic circuit over R of size n and degree d can be computed in O(log n log(nd)) parallel time using M (n) processors, where M (n) is the number of processors required to multiply two n n matrices over R in O(log n) time.
hN; ; S; Pi and x 1 2 Now, in order to compute G x on input G n 2 n we define an arithmetic circuit C G ; x on N implementing a counting version of the traditional CYK algorithm. The input nodes of C G ; x are A; i; i , where A 2 N , i n and they are labelled by if A!i 2 P and otherwise. Addition nodes are A; i; j with A 2 N , i < j n, and multiplication nodes are B; C; i; k; j with D!BC 2 P for some D 2 N , i k < j n. The inputs of every addition node A; i; j are the nodes B; C; i; k; j such that A!BC 2 P ; the inputs of every multiplication node B; C; i; k; j are the nodes B; i; k and C; k ; j . It is easy to show that the value of node S; ; n is G x . Lemma 6. The problem of computing G x given as input a terminal string x and a context-free grammar G in Chomsky normal form, can be solved by a uniform family n m 2 depth, where n jxj of boolean circuits of nm O(1) size and O and m is the size of G . amb
(
)
=
(
=
)
(
1
1
(
(
)
)
(
)
(
)
(
amb
)
(
(
)
)
(
)
1
)
(
+1
(
0
1
(
(
(
)
)
(
1
)
amb
(
)
)
)
)
((log
+ log
) )
=
Proof (sketch). We observe that Proposition 1 is based on a parallel algorithm which, ^ and degree d, executes O (log n ^ d) times a cycle for an input arithmetic circuit of size n of operations, the most expensive one being the product of two n ^ n ^ matrices over R. 3 3n . In our case, n ^ = O (n m), d = n and the value of the nodes is bounded by m Hence, the above matrix product can be computed by a boolean circuit of polynomial ut size and O(log n + log m) depth.
G
Using the same approach, one can compute on input = hN; ; S; Pi, A 2 N and ` > 0, the number G (A; `) of derivation trees of rooted at A and deriving a word in ` . It is sufficient to map all terminal symbols 2 into the unique symbol z , so defining a new c.f. grammar A0 = hN; fzg; A; P 0i, where P 0 is obtained from P by replacing all productions (B!) 2 P with B!z and labelling every input node 0 ` (B; i; i) of the circuit C ( A ;`z ) with the cardinality of f(B!) 2 P : 2 g. Hence, G (A; `) = ambGA0 (z ) and the computation can be carried out as in Lemma 6. This allows to apply the approach presented in Section 4.1 to generate uniformly at random a word of length n, according to the distribution given in (1), assuming the grammar as a part of the input.
G
G
G
5.2 Polynomially Ambiguous 1-NAuxPDA We recall that a 1-NAuxPDA is a nondeterministic Turing machine having a one-way read-only input tape, a pushdown tape and a log-space bounded two-way read-write work tape [6, 5]. It is known that the class of languages accepted by 1-NAuxPDA working in polynomial time coincides with the class of decision problems reducible to context-free recognition via one-way log-space reduction [17]. , we denote by ambM (x) the number of accepting comGiven a 1-NAuxPDA on input x 2 , and call ambiguity degree of the function dM : putations of N ! N defined by dM (n) = maxfambM (x) : x 2 n g, for every n 2 N. Then, is said polynomially ambiguous if, for some polynomial p(n), we have dM (n) p(n) for every n > 0. It is known that, if works in polynomial time, given an integer input n > 0, a c.f. grammar n in Chomsky normal form, of size polynomial in n, can be built such that L( n ) \ n = L( ) \ n [6]. This construction can be refined in such a way that the ambiguity degree of n does not increase with respect to the ambiguity degree of , i.e., for every n 2 N, the number of derivation trees of any word x 2 n in n is less or equal to the number of accepting computations of on input x [4]. Moreover, the problem of computing such a refined n on input 1n belongs to NC2 as shown in [1]. Therefore, the random generation problem for the language accepted by a polynomial time is reduced to generating words of length n from the grammar n uniformly at random. This can be done by a general version of the algorithm described in Subsection 4.2 where the c.f. grammar n is part of the input. Thus, if the ambiguity of is polynomial, by Lemma 6, the overall computation can be carried out in O(log2 n) depth and polynomial size.
M
M
M
M
G
G
M
M G
M
G
M
G
M
G
G
M
Theorem 4. For every language accepted by a polynomially ambiguous 1-NAuxPDA working in polynomial time, the uniform random generation problem belongs to RNC2g .
6 The General Case for Regular Languages In this section we consider the random generation problem for regular languages assuming as input both the length of the word to be generated and the deterministic finite automaton recognizing the language. Using the same notation of Section 3, we say that a family of probabilistic boolean circuits f n;m gn;m>0 solves the general problem of uniform random generation for regular languages, if each n;m , having in input 1n and a deterministic finite automaton of size m, computes a value !n;m in n [f?g such that, if L( ) \ n 6= ?, then:
A
A
1. 2.
f!n;m ?g = , f!n;m x j !n;m 6 ?g
Pr
=
Pr
=
1 4
=
= L A \ n , for every x 2 L A \ n .
= 1 #(
(
)
)
(
)
The problem can be solved by a family of circuits designed as in Section 3 to generate a word uniformly at random from a fixed regular language. Here, there are two main differences. First of all, since = h; Q; q0 ; F; Æi is part of the input, the coefficients (q; `) for q 2 Q and 0 ` n can be computed in DET (rather than in DIV), because such task is reducible to computing the `-th power of a m m integer matrix. Second,
A
once the graph Gn ( ) is obtained, the computation of ! (q0 ; n) belongs to L (rather than NC1 ) since it is reducible to a reachability problem in a direct acyclic graph whose nodes have out-degree at most 1 [8]. Hence, we obtain the following
A
Theorem 5. The general problem of uniform random generation for regular languages is solved by a uniform family of probabilistic boolean circuits of polynomial size and O(log(n + m)) depth with oracle nodes in DET.
7 Concluding Remarks In this paper we have studied the circuit complexity of the uniform random generation problem for several classical formal languages. An interesting application of the results presented here is related to counting problems, i.e. computing #(L \ n ) on input n > 0. It is well-known that random generation is related to counting and that there are cases in which exact counting is hard, while the random uniform generation is easy and allows to obtain approximation schemes for the counting problem [14, 13]. This is for instance the case for some finitely ambiguous context-free languages, as discussed in [3, 4]. In a forthcoming paper we will show that a RNC2 approximation scheme can be designed for the counting problem of every language accepted by a polynomial time 1-NAuxPDA of polynomially bounded ambiguity.
References [1] E. Allender, D. Bruschi, and G. Pighizzini. The complexity of computing maximal word functions. Computational Complexity, 3:368–391, 1993. [2] A. Bertoni, M. Goldwurm, and P. Massazza. Counting problems and algebraic formal power series in noncommuting variables. Information Processing Letters, 34(3):117–121, April 1990. [3] A. Bertoni, M. Goldwurm, and N. Sabadini. The complexity of computing the number of strings of given length in context-free languages. Theoretical Computer Science, 86(2):325–342, 1991. [4] A. Bertoni, M. Goldwurm, and M. Santini. Random generation and approximate counting of ambiguously described combinatorial structures. In Horst Reichel and Sophie Tison, editors, Proceedings of 17th Annual Symposium on Theoretical Aspects of Computer Science (STACS), number 1770 in Lecture Notes in Computer Science, pages 567–580. Springer, 2000. [5] F.-J. Brandenburg. On one-way auxiliary pushdown automata. In H. Waldschmidt H. Tzschach and H. K.-G. Walter, editors, Proceedings of the 3rd GI Conference on Theoretical Computer Science, volume 48 of Lecture Notes in Computer Science, pages 132–144, Darmstadt, FRG, March 1977. Springer. [6] S. A. Cook. Characterizations of pushdown machines in terms of time-bounded computers. Journal of the ACM, 18(1):4–18, January 1971. [7] S. A. Cook. A taxonomy of problems with fast parallel algorithms. Information and Control, 64:2–22, 1985. [8] S. A. Cook and P. McKenzie. Problems complete for deterministic logarithmic space. Journal of Algorithms, 8(3):385–394, September 1987.
[9] A. Denise. Génération aléatoire et uniforme de mots de langages rationnels. Theoretical Computer Science, 159(1):43–63, 1996. [10] P. Flajolet, P. Zimmerman, and B. Van Cutsem. A calculus for the random generation of labelled combinatorial structures. Theoretical Computer Science, 132(1-2):1–35, 1994. [11] V. Gore, M. Jerrum, S. Kannan, Z. Sweedyk, and S. Mahaney. A quasi-polynomial-time algorithm for sampling words from a context-free language. Information and Computation, 134(1):59–74, 10 April 1997. [12] T. Hickey and J. Cohen. Uniform random generation of strings in a context-free language. SIAM Journal on Computing, 12(4):645–655, November 1983. [13] M. Jerrum and A. Sinclair. Approximate counting, uniform generation and rapidly mixing markov chains. Information and Computation, 82:93–133, 1989. [14] M. R. Jerrum, L. G. Valiant, and V. V. Vazirani. Random generation of combinatorial structures from a uniform distribution. Theoretical Computer Science, 43(2-3):169–188, 1986. [15] R. M. Karp, M. Luby, and N. Madras. Monte-carlo approximation algorithms for enumeration problems. Journal of Algorithms, 10:429–448, 1989. [16] R. M. Karp and V. Ramachandran. Parallel algorithms for shared-memory machines. In J. van Leeuwen, editor, Handbook of Computer Science. MIT Press/Elsevier, 1992. [17] C. Lautemann. On pushdown and small tape. In K. Wagener, editor, Dirk-Siefkes, zum 50. Geburststag (proceedings of a meeting honoring Dirk Siefkes on his fiftieth birthday), pages 42–47. Technische Universität Berlin and Universität Ausgburg, 1988. [18] G. L. Miller, V. Ramachandran, and E. Kaltofen. Efficient parallel evaluation of straightline code and arithmetic circuits. SIAM Journal on Computing, 17(4):687–695, August 1988. [19] D. B. Searls. The computational linguistics of biological sequences. In Larry Hunter, editor, Artificial Intelligence and Molecular Biology, chapter 2, pages 47–120. AAAI Press, 1992. [20] R. Smith. A finite state machine algorithm for finding restriction sites and other pattern matching applications. Comput. Appl. Biosci., 4:459–465, 1988. [21] V. Vinay. Counting auxiliary pushdown automata and semi-unbounded arithmetic circuits. In Christopher Balcázar, José; Borodin, Alan; Gasarch, Bill; Immerman, Neil; Papadimitriou, Christos; Ruzzo, Walter; Vitányi, Paul; Wilson, editor, Proceedings of the 6th Annual Conference on Structure in Complexity Theory (SCTC ’91), pages 270–284, Chicago, IL, USA, June 1991. IEEE Computer Society Press. [22] I. Wegener. The Complexity of Boolean Functions. B. G. Teubner, Stuttgart, 1987.