On the Complexity of Random Strings (Extended Abstract) Martin Kummer Institut ffir Logik, Komplexit~t und Deduktionssysteme, Universit~t Karlsruhe, D-76128 Karlsruhe, Germany.
[email protected] A b s t r a c t . We show that the set R of Kolmogorov random strings is truth-table complete. This improves the previously known Turing completeness of R and shows how the halting problem can be encoded into the distribution of random strings rather than using the time complexity of non-random strings. As an application we obtain that Post's simple set is truth-table complete in every Kolmogorov numbering. We also show that the truth-table completeness of R cannot be generalized to sizecomplexity with respect to arbitrary acceptable numberings. In addition we note that R is not frequency computable. 1
Introduction
In Kolmogorov complexity a string x of length n is called random if it is incompressible in the sense that each program which computes x from the empty input has length at least n, i.e., there is no better way to generate z than by table look-up. Let R denote the set of all random strings. Random strings play an instrumental role in many applications of Kolmogorov complexity, most prominently in the "incompressibility method" (see [4, 13] for further details). Time bounded versions of randomness are important for structural complexity theory; see [2, 3] for recent work involving the set R and its time bounded relatives. In view of the many concrete applications of random strings it is at first glance somewhat disturbing that R is a highly non-constructive undecidable set. But a closer consideration reveals that this is inherent in the approach to randomness via incompressibility. The main goal of the present paper is to characterize as precisely as possible the undecidability of R (or equivalently of R) in comparison with the halting problem K. It is known that R (the set of all non-random strings) is recursively enumerable and Turing complete for the r.e. sets, i.e., the halting problem can be computed using R as an oracle, cf. [13, Ex. 2.63]. But the running time of the oracle-algorithm witnessing this fact is not bounded by any recursive function. The question naturally arises whether there is an oracle-algorithm which achieves recursive running time. Technically, this question is equivalent to the question whether R is truth-table complete. We give a positive answer and even provide an algorithm of the following type: On input z it computes a finite set S of queries such that z E K iff S does not contain a random string. Since R is a simple set (i.e., R is infinite and has no infinite recursively enumerable subset), it cannot be bounded truth-table complete, in particular it
26 is not m-complete. Thus our result also provides a natural example (probably the first) of a set which is truth-table complete but not m-complete. In contrast to the proof of Turing completeness we make essential use of the Invariance Theorem of Kolmogorov complexity, i.e., the fact that Kolmogorov complexity is minimal (under all algorithmic encodings of strings) up to an additive constant. While Turing completeness still holds with respect to encodings in any acceptable numbering, we show that truth-table completeness fails in some acceptable numbering. As a further application of our proof we obtain that Post's simple set [16] is truth-table complete with respect to all additively optimal acceptable numberings. By a result of Lachlan [12] there is an acceptable numbering where Post's simple set is not truth-table complete. Finally we show that R is not frequency computable, i.e., there is no algorithm that for some k > 1 and all x l , . . . , x k predicts which of the zi's are random such that at least one of its predictions is correct. 2
Notation
and
Definitions
The notation generally follows the book of Li and Vit~nyi [13], for background in recursion theory the reader is referred to the textbooks o f Odifreddi [15] and Soare [19]. The characteristic function of a set A is denoted by XA. For E {0, 1}*, M denotes the length of a; e is the empty string, w is the set of all natural numbers. By (., .) we denote a computable pairing function from w ~ to w. We identify numbers and binary strings by associating each string with its index in the lexicographical ordering (see [13, p. 11]). The length of i E w is the length of the i-th string which we denote by lg(i). Note that lg(i) = [log2(i+ 1)J. Let P , [R,] be the set of all partial [total] recursive n-ary functions f : w'* --~ w. Let Ti denote the i-th function in a standard enumeration of all partial recursive functions of one argument. If r is partial recursive then Cs(z) denotes the result, if any, of performing s steps in the computation of r f ( z ) ~ means that f is defined on z, f ( z ) T means that f is undefined on z. dom(f) = { x : f(x) ~}, range(f) = {f(x) : x 9 dom(f)}. Wi = dom(~pi) is the i-th recursively enumerable (r.e.) set. K = {e: ~ ( e ) 1} is the halting problem. If A is an r.e. set then As denotes the set of elements enumerated into A before step s in some fixed recursive enumeration of A. Dn denotes the n-th finite set in a canonical enumeration of all finite sets. A set A is called simple if A is r.e., A is infinite, and A does not contain an infinite r.e. subset. For convenience of the reader we recall the definitions of the reducibilities which are mentioned in this paper, see [15] for additional information. Turing reducibility: A i. Second, at least i2 n-2d-2 elements must have appeared in Rr (q {0, 1} n. Intuitively, the i-th sequence guesses that i is m a x i m a l such t h a t there are infinitely m a n y such numbers and since we have 22a+2 m a n y i's, one of these guesses must be correct. Let M,,, = {~ E {0, 1}": C~(~) < n}. Note t h a t R e = Un,,eoJM~,s.
Construction: Stage 0 : All n are declared "unused". Let Si,x =T, mi,, =T for all i, x > 0; let mi,-1 = - 1 and g(i) = 0 for all i. Stage s + 1 : (1) If there is an i, 0 < i < 22d+2, and an n _< s such that: - n is unused, n >__2 d + 2, -
-
n ~s mj,~ for all (j, z) with j _> i, i2 n-2d-2 i0 is chosen; but this does not happen after stage so. Let x0 denote the least x such that mio,~ is defined for all x > x0. Claim 3: For all x _> x0: I f x E K then Emio,~ = Sio,x; i f x r K then Emio,,~ - - 0. Proof: Let x > x0, then there is an s such that mlo,r is defined at stage s + 1, say mio,r = n. Thus, n is unused and so En = 0 at the beginning of stage s + 1. Since mio,~ is defined at all later stages, En changes at some later stage only if x E K in which case En = Sio,~, by part (2) of the construction. Claim 4: For almost all x: x E K r Sio,x C Re. Proof." If x > x0 and x E K , then Emio,~ = Sio,r. Assume that x is large enough such that mio,~ >_ 2 d 0 - 2. Let n = mio,r. For each z E En there is z' e {0, 1} n-2d~ such that r](lao0z ', e) = z, i.e., Co(z ) < n - do - 1. Thus, by choice of do it follows that for all z e En: Cr < Co(z ) + do < n, i.e., z E R~. Suppose for a contradiction that there are infinitely many x r K such that Sio,~ _C Re. Choose such an x for which mio,~: is almost always defined. At the least stage st + 1 when mio,~: was defined, we had IM,~,~[ >__ i2 '~-2d~ and Sio,~NMn,8 = 0. By hypothesis, there is s > m a x ( s o , s t ) with Sio,r C_ Mn,~. By
31
definition of Si0,~ it follows that ]M,,,[ _> (i0+l)2"-~d-L Since IR0n{0,1}"1 >_ 1 we have i0 + 1 < 2 2d-2 Thus, at stage s + 1 the pair (i0 + 1, mio,x) satisfies the condition in stage s + 1 and some j > i0 is chosen, contradicting the choice of S0.
Claim 5: K me. Since 3 92 m~ . 2 k/3 < 2 k-e-3 for all k > me, we have always enough elements of length k left to perform the required diagonalization action. Remark: By a refinement of the above argument one can construct a polynomially optimal GSdelnumbering r such that even the more informative set
{Ix,p): (3q < p)[r
=
is not tt-complete.
5
An Application to Post's Simple Set
Post's simple set [16, p. 298] is the classical example of a simple set and appears in most textbooks on computability theory. Its definition depends on an underlying numbering of all partial recursive functions and a recursive enumeration of their domains. Formally it is defined as follows.
34 Definition 10. For r E P~ and f C RI such that range(f) = dorn(r let Sr = { z : (3s, i)[f(s) = (i, x) A x > 2i A (Vs' < s)[f(s') = (i, z') ~ x' < 2i]]}. Sr is called Post's simple set with respect to the numbering r and the enumeration f. Lachlan [12] considered the question whether Post's simple set is tt-complete and showed that the answer depends on the underlying G6delnumbering. Fact 11. a.) For every Gfdelnumbering r and enumeration f, the set Sr is wtt-comptete. (Ladner, see [15, p. 3391) b.) There is a G6delnumbering r and an associated enumeration f such that Sr is not ti-complete. (Lachlan [121) c.) There is a G6delnumbering r and an associated enumeration f such that Sr is tt-complete. (Lachlan [12]) The technique suggested by Lachlan in [12] to prove Fact 11, b.) does not give a polynomially optimal GSdelnumbering. However, a polynomially optimal GSdelnumbering can be obtained using the game theoretic method of Theorem 9. On the other hand, by a modification of the proof of Theorem 5 we can strengthen Fact 11, c.) as follows. T h e o r e m 12. Let r f be as in Definition 10. I r e is a Kolmogorov numbering, then Sr is tt-complete. Jockusch and Soare [8] considered Post's hypersimple set and proved that its T-completeness depends on the underlying GSdelnumbering. By slightly modifying their constructions one can show that this holds even with respect to Kolmogorov numberings, i.e., here the analog of Theorem 12 fails. 6
Random
Strings
are not Frequency
Computable
Frequency computation was introduced by Rose [17] in the early sixties and captures a natural notion of approximative computability. Many new results have been obtained recently for this notion in recursion theory as well as in complexity theory, see [1] and the literature cited therein. D e f i n i t i o n l 3 . A set A is called (m, k)-computable (1 < m < k) if there is a recursive function f such that f ( z l , . . . , zk) E {0, 1} k and f ( z l , . . . , z~) agrees with (XA(Zl),...,XA(Zk)) in at least m components, for all pairwise distinct z l , . . . , z~. A is called frequency computable if there is some k > 1 such that A is (1, k)-computable. The halting problem K is not frequency computable, but there exist frequency computable c-complete sets [11]. However, one would not expect that R is frequency computable. This is confirmed by our next result. T h e o r e m 1 4 . Let r be any optimal numbering and k > 1. Then Rr is not frequency computable.
35
Proof. Suppose for a contradiction that Re is (1, k)-computable. By [1, Fact 2.6] there is an algorithm that for every list of strings x l , . 9 xm computes a set of at most p(m) strings including the characteristic string (XR, ( x l ) , . . . , XR, (Xm)). Here p(m) is a polynomial of degree k - 1, i.e., p(m) = O(rnk-1). By a counting argument there is at least one random string of each length. The next lemma shows that there are even exponentially many. It was independently discovered by several people (including the author), but no published proof exists. 4 L e m m a 15. For every optimal numbering r there is a constant c such that IRe gl {0, 1}nl > 2"/c for all n.
Proof. Let r be an optimal numbering. We define a partial recursive function 77E P2. The value r/(ld0v, e) is computed as follows: Let n = Iv] + 2d + 2 and let m be the number whose binary representation (possibly with leading zeros) is v. Enumerate the set Re until m distinct strings of length n appear (if this never happens then r/(ld0v, e) is undefined). Determine the lexicographically least string z of length n which has not been enumerated. Let rl(ld0v, e) = z. By hypothesis there is a constant do such that Cr = C,~(x)+do. We claim that IRe n {0, 1} n] > 2n-~d~ which proves the lemma. Suppose for a contradiction that r(n) := IR.r gl {0, 1}hi < 2 n-2d~ for some n. As IRe O {0, 1}'~[ > 1, we have n > 2 d 0 - 2. Since the binary representation of r(n) has length at most n - 2 d o - 2 , we can append leading zeros to obtain a string v of length n - 2 d o - 2 . By definition of r/, the value z := r](1d~0v, e) is defined and z is an element ofReN{0, 1} '~. Thus, Cr _< C~(z)+do [Li[/c and let zi = xk. Eliminate all v from Li with v[k] = 1 and in this way obtain Li+l. Since the total number of l's in all strings of Li is at least (2n/c)lL~[, there must be one component k which is 1 in at least ILi[/c many strings, i.e., zi exists. 'Note that [ni+l[