1
Efficiently Extracting Randomness from Imperfect Stochastic Processes
arXiv:1209.0734v1 [cs.IT] 4 Sep 2012
Hongchao Zhou, and Jehoshua Bruck, Fellow, IEEE
Abstract—We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related interval algorithm proposed by Han and Hoshi has asymptotically optimal performance; however, it assumes that the distribution of the input stochastic process is known. The motivation for our work is the fact that, in practice, sources of randomness have inherent correlations and are affected by measurement’s noise. Namely, it is hard to obtain an accurate estimation of the distribution. This challenge was addressed by the concepts of seeded and seedless extractors that can handle general random sources with unknown distributions. However, known seeded and seedless extractors provide extraction efficiencies that are substantially smaller than Shannon’s entropy limit. Our main contribution is the design of extractors that have a variable inputlength and a fixed output length, are efficient in the consumption of symbols from the source, are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency. Index Terms—Randomness Extraction, Imperfect Stochastic Processes, Variable-Length Extractors.
I. I NTRODUCTION
W
E study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from imperfect stochastic processes. For perfect stochastic processes, including processes with known accurate distributions or perfect biased coins, this problem has been well studied. It dates back to von Neumann [9] who considered the problem of generating random bits from a biased coin with unknown probability. Recently, in [30], we improved von Neumann’s scheme and introduced an algorithm that generates ‘random bit streams’ from biased coins, uses bounded space and runs in expected linear time. This algorithm can generate a prescribed number of random bits with an asymptotically optimal efficiency. On the other hand, efficient algorithms have also been developed for extracting randomness from any known stochastic process (whose distribution is given). In [13], Knuth and Yao presented a simple procedure for generating sequences with arbitrary probability distributions from an unbiased coin (the probability of H and T is 21 ). In [1], Abrahams considered a source of biased coin whose distribution is an integer power of a noninteger. Han and Hoshi [10] studied the general problem and proposed an interval algorithm that generates a prescribed number of random Hongchao Zhou and Jehoshua Bruck are with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA, e-mail:
[email protected];
[email protected]. This work was supported in part by the NSF Expeditions in Computing Program under grant CCF-0832824.
bits from any known stochastic process and achieves the information-theoretic upper bound on efficiency. However, in practice, sources of stochastic processes have inherent correlations and are affected by measurement’s noise, hence, they are not perfect. Existing algorithms for extracting randomness from perfect stochastic processes cannot work for imperfect stochastic processes, where uncertainty exists. To extract randomness from an imperfect stochastic process, one approach is to apply a seeded or seedless extractor to a sequence generated by the process that contains a sufficient amount of randomness, and we call this approach as a fixedlength extractor for stochastic processes since all the possible input sequences have the same fixed length. Efficient constructions of seeded or seedless extractors have been extensively studied in last two decades, and it shows that the number of random bits extracted by them can approach the source’s minentropy asymptotically [7], [12], [16], [19], [22]. Although fixed-length extractors can generate random bits with good quality from imperfect stochastic processes, their efficiencies are not close to the optimality. Here, we define the efficiency of an extractor for stochastic processes as the asymptotic ratio between the number of extracted random bits and the entropy of its input sequence (the entropy of its input sequence is proportional to the expected input length if the stochastic process is stationary ergodic), which is upper bounded by 1 since the process of extracting randomness does not increase entropy. Based on this definition, we can conclude that the efficiency of a fixed-length extractor is upper bounded by the ratio between the min-entropy and the entropy of the input sequence, which is usually several times smaller than 1. So fixed-length extractors are not very efficient in extracting randomness from stochastic processes. The intuition is that, in order to minimize the expected number of symbols read from an imperfect stochastic process, the length of the input sequence should be adaptive, not being fixed. The concept of min-entropy and entropy are defined as follows. Definition 1. Given a random source X on {0, 1}n, the minentropy of X is defined as 1 . Hmin (X) = min n log P [X = x] x∈{0,1} The entropy of X is defined as X P [X = x] log H(X) = x∈{0,1}n
1 . P [X = x]
The following example is constructed for comparing entropy with min-entropy for a simple random variable.
2
Example 1. Let X be a random variable such that P [X = 0] = 0.9 and P [X = 1] = 0.1, then Hmin (X) = 0.152 and H(X) = 0.469. In this case, the entropy of X is about three times its min-entropy. In this paper, we focus on the notion and constructions of variable-length extractors (short for variable-to-fixed length extractors), namely, extractors with variable input length and fixed output length. (Note that the interval algorithm proposed by Han and Hoshi [10] and the streaming algorithm proposed by us [30] are special cases of variable-length extractors). Our goal is to extract a prescribed number of random bits in the sense of statistical distance while minimizing the expected input cost, measured by the entropy of the input sequence (whose length is variable). To make this precise, we let d(R, M) be the difference between two known stochastic processes R and M, defined by d(R, M) = lim sup max n n→∞ x∈{0,1}
PR (x) PM (x) log2 PM1(x)
log2
The remainder of this paper is organized as follows. Section II presents background and related results. In Section III, we demonstrate that one cannot construct a variablelength extractor with efficiency strictly larger than 1 − β when the source has uncertainty β. Then we focus on the seeded constructions of variable-length extractors, namely, we use a small number of additional truly random bits as the seed (catalyst). Three different constructions are provided and analyzed in Section IV, Section V and Section VI separately. All these constructions have efficiencies lower bounded by 1 − β, implying their optimality. Finally, we discuss seedless constructions of variable-length extractors for some types of random sources in Section VII, followed by the concluding remarks. II. P RELIMINARIES A. Statistical Distance
,
where PR (x) is the probability of generating x from R when the sequence length is |x|, and PM (x) is the probability of generating x from M when the sequence length is |x|. A few models of imperfect stochastic processes are introduced and investigated, including, • Let M be a known stochastic process, we consider an arbitrary stochastic process R such that d(R, M) ≤ β for a constant β. • We consider R as an arbitrary stochastic process such that minM∈Gs.e. d(R, M) ≤ β for a constant β, where Gs.e. denotes the set consisting of all stationary ergodic processes. Generally, given a real slight-unpredictable source R, it is not easy to estimate the exact value of d(R, M) for a stochastic process M . But its upper bound, i.e., β, can be easily obtained. The parameter β describes how unpredictable the real source R is, so we call it the uncertainty of R. We prove that it is impossible to construct an extractor that achieves efficiency strictly larger than 1−β for all the possible sources R with uncertainty β. Then we introduce several constructions of variable-length extractors, and show that their efficiencies can reach η ≥ 1 − β; that is, the constructions are asymptotically optimal. The proposed variable-length extractors have two benefits: (i) they are generalizations of algorithms for perfect sources to address general imperfect sources; and (ii) they bridge the gap between min-entropy and entropy on efficiency. The following example is constructed to compare the performances of a variable-length extractor and a fixed-length extractor when extracting randomness from a slightly-unpredictable independent process. Example 2. Consider an independent process x1 x2 x3 ... such that P [xi = 1] ∈ [0.9, 0.91], then it can be obtained that β ≤ 0.0315. For this source, a variable-length extractor can generate random bits with efficiency at least 1 − β = 0.9685 that is very close to the upper bound 1. In comparison, fixedlength extractors can only reach the efficiency at most 0.3117.
Statistical Distance is used in computer science to measure the difference between two distributions. Let X and Y be two random sequences with range {0, 1}m, then the statistical distance between X and Y is defined as kX − Y k =
max
T :{0,1}m →{0,1}
|P [T (X) = 1] − P [T (Y ) = 1]|
over a boolean function T . We say that X and Y are ǫ-close if kX − Y k ≤ ǫ. According to this definition, we can also write 1 X |P [X = x] − P [Y = x]| ≤ ǫ. kX − Y k = 2 m x∈{0,1}
It is equivalent to the former expression. Let Um denote the uniform distribution on {0, 1}m. In order to let a sequence Y to be able to take place of the truly random bits in a randomized application, we let Y be ǫ-close to Um , where ǫ is small enough. In this case, the extra probability error introduced by this replacement is at most ǫ. In this paper, we want to extract m almost-random bits such that they form a sequence ǫ-close to the uniform distribution Um on {0, 1}m with specified small ǫ > 0, i.e., kY − Um k ≤ ǫ. B. Seeded Extractors In 1990, Zuckerman introduced a general model of weak random sources, called k-sources, namely whose min-entropy is at least k [32]. It was shown that given a source on {0, 1}n with min-entropy k < n, it is impossible to devise a single function that extracts even one bit of randomness. This observation led to the introduction of seeded extractors, which use a small number of additional truly random bits as the seed (catalyst). When simulating a probabilistic algorithm, one can simply eliminate the requirement of truly random bits by enumerating all possible strings for the seed and taking a majority vote on the final results. There are a variety of very efficient constructions of seeded extractors, summarized in [7], [16], [22]. Mathematically, a seeded extractor is a function, E : {0, 1}n × {0, 1}d → {0, 1}m,
3
such that for every distribution X on {0, 1}n with Hmin (X) ≥ k, the distribution E(X, Ud ) is ǫ-close to the uniform distribution Um . Here, d is the seed length, and we call such an extractor as a (k, ǫ) extractor. There are a lot of works focusing on efficient constructions of seeded extractors. A standard application of the probabilistic method [17] shows that there exists a seeded extractor which can extract asymptotically Hmin (X) random bits with log(n−Hmin (X)) additional truly random bits. Recently, Guruswami, Umans and Vadhan [9] provided an explicit construction of seeded extractors, whose efficiency is very close to the bound obtained based on the probabilistic method. Their main result is described as follows: Lemma 1. [9] For every constant α > 0, and all positive integers n, k and all ǫ > 0, there is an explicit construction of a (k, ǫ) extractor E : {0, 1}n × {0, 1}d → {0, 1}m with d ≤ log n + O(log(k/ǫ)) and m ≥ (1 − α)k. The above result implies that given any source X ∈ {0, 1}n with min-entropy k, if ≥ (1+α)m with α > 0, we can always construct a seeded extractor to generates a random sequence Y ∈ {0, 1}m that is ǫ-close to the uniform distribution. In this case, the seed length d ≤ log n + O(log(k/ǫ)) depends on the input length n and the parameter ǫ. C. Seedless Extractors In the last decade, the concept of seedless (deterministic) extractors has attracted renewed interests, motivated by the reduction of the computational complexity for simulating probabilistic algorithms as well as some requirements in cryptography [6]. Several specific classes of sources have been studied, including independent sources, which can be divided into several independent parts containing certain amount of randomness [2], [19], [20]; bit-fixing sources, where some bits in a binary sequence are truly random and the remaining bits are fixed [4], [8], [11]; samplable sources, where the source is generated by a process that has a bounded amount of computational resources like space [12], [25]. For example, suppose that we have multiple independent sources with the same length n. It is known how to extract from two sources when the min-entropy in each is ≥ 0.5n [20] or slightly less than 0.5n [3], how to extract from O(1/γ) sources if the minentropy in each is at least nγ [18]. All these constructions have exponentially small error, and they are able to extract Θ(k) random bits. Both seeded extractors and seedless extractors described above have fixed input length, fixed seed length (d = 0 for seedless extractors) and fixed output length. So we call them fixed-length extractors. To apply fixed-length extractors in extracting randomness from a stochastic process, it needs to first read a sequence of fixed length, whose min-entropy is strictly larger than the number of random bits that we need to generate. Fixed-length extractors can generate random bits of good quality from imperfect stochastic processes, but they usually consume more incoming symbols than what are necessarily required. To increase information efficiency, we let the length of input sequences be adaptive, hence, we have the concept of ‘variable-length extractors’.
D. Variable-Length Extractors A variable-length extractor is an extractor with variable input length and fixed output length. When applying a variablelength extractor to a stochastic process, it reads incoming symbols one by one until the whole incoming sequence meets certain criterion, then it maps the incoming sequence into a binary sequence of fixed length as the output. Depending on the sources, the construction may require a small number of additional truly random bits as the seed. Hence, we have seeded variable-length extractors and seedless variable-length extractors. A seeded variable-length extractor is a function, VE : Sp × {0, 1}d → {0, 1}m, such that given a real source R, the output sequence is ǫ-close to the uniform distribution Um . Here, Sp is the set consisting of all possible input sequences, called the input set. It is complete and prefix-free. The input sequence is compete, that means, any infinite sequence has a prefix in the set; so when reading symbols from any source, we can always meet a sequence in the set. Then we stop reading and map this sequence into a binary sequence of length m. Being prefix-free is not very necessary; it ensures that all the sequences in Sp are possible to read. A general procedure of extracting randomness by using variable-length extractors can be divided into three steps: 1) Determining an input set Sp such that its min-entropy based on the real source R is at least k, namely, min log2
x∈Sp
1 ≤ k, PR (x)
where k ≥ (1 + α)m for any α > 0. 2) We construct an injective function V : Sp → {0, 1}n , to map the sequences in Sp into binary sequences of length m. We read symbols from the source R one by one until the current incoming sequence matches one in Sp . This incoming sequence is then mapped to a binary sequence of length n based on function V . As a result, we get a random sequence Z with length n and minentropy k (since V is injective). 3) Since k = (1 + α) with an α > 0, according to Lemma 1, we can always find a seeded extractor, E : {0, 1}n × {0, 1}d → {0, 1}m that can extract m almost-random bits from a source with min-entropy k. By applying this seeded extractor E to the sequence Z, we get a random sequence of length m that is ǫ-close to the uniform distribution Um . Here, the seed length d ≤ log n + O(log(k/ǫ)). We can see that the construction of a variable-length extractor is a cascade of a function V and a seeded extractor E, i.e., O VE = E V. Note that our requirement is to extract a sequence of m almost-random bits that is ǫ-close to the uniform distribution
4
Um . The key of constructing variable-length extractors is to find the input set Sp with min-entropy k, even the distribution of the real source R is slightly unpredictable, such that the expected length of the sequences in Sp is minimized. For stationary ergodic processes, minimizing the expected length is equivalent to minimizing the entropy of the sequences in Sp asymptotically (this will be discussed in this section). For some specific types of sources, including independent sources and samplable sources, by applying the ideas in [19] and [12] we can remove the requirement of truly random bits without degrading the asymptotic performance. As a result, we have seedless variable-length extractors. For example, if the source R is an independent process, we can first apply the method in [19] to extract d almost-random bits from the first Θ(log m ǫ ) bits, and then use them as the seed of a seeded variable-length extractor to extract randomness from the rest of the process. The detailed discussions will be given in Section VII. III. E FFICIENCY
AND
U NCERTAINTY
A. Efficiency
Let p(x) denote P [X = x] for x ∈ {0, 1}m. Since X is ǫ-close to the uniform distribution Um , we have 1 X kp(x) − 2−m k ≤ ǫ. 2 m x∈{0,1}
Then the lower bound of H(X) can be written as X 1 p(x) log2 min p p(x) m x∈{0,1}
subject to p(x) ≥ 0, ∀x ∈ {0, 1}m; X p(x) = 1; x∈{0,1}m
X
kp(x) − 2−m k ≤ 2ǫ.
x∈{0,1}m
Obviously, the optimal solution of the above problem happens at X kp(x) − 2−m k = 2ǫ. x∈{0,1}m
To consider the performance of an extractor, we define its efficiency as the asymptotical ratio between the output length and the total entropy of all its inputs. So the efficiency of an extractor can be written as m , η = lim m→∞ HR (Xm ) + d such that the output sequence is ǫ-close to the uniform distribution Um on {0, 1}m, where ǫ is small, d is the seed length, m is the output length, and HR (Xm ) is the entropy of the input sequence Xm with range on Sp . In our constructions, d ≤ log n + O(log(m/ǫ)), which is ignorable compared to HR (Xm ) when m → ∞. Hence, we can write m η = lim . m→∞ HR (Xm ) In the definition, we use the entropy of the input sequence rather than the expected input length, because the source that we considered may not be stationary ergodic. It needs to mention that, in seeded constructions, the value of d is also an important parameter although it is much smaller than m. The problem of minimizing the seed length d can be studied separately from minimizing the entropy of the input sequence, and it will be addressed in this paper. First, we demonstrate that if a distribution is ǫ-close to the uniform distribution Um , then the entropy of this distribution is asymptotically m for any ǫ < 1. Lemma 2. Let X be a random sequence on {0, 1}m that is ǫ-close to the uniform distribution Um , then m − log2
1 ≤ H(X) ≤ m. 1−ǫ
Proof: Since there are totally 2m possible assignments for X, it is easy to get H(X) ≤ m. So we only need to prove that 1 . H(X) ≥ m − log2 1−ǫ
To solve the problem based on Lagrange Multipliers, we let X X 1 p(x) − 1) + λ1 ( p(x) log2 λ(p) = p(x) m m x∈{0,1}
x∈{0,1}
+λ2 (
X
kp(x) − 2−m k − 2ǫ).
x∈{0,1}m
If p(x) ≥ 0 with x ∈ {0, 1}m is a solution of the above question, then ∂λ = 0, ∂(p(x)) i.e., (
ln p(x)+1 ln 2 ln p(x)+1 ln 2
+ λ1 + λ2 = 0 + λ1 − λ2 = 0
if 2−m ≤ p(x) ≤ 1, if 0 ≤ p(x) ≤ 2−m .
So there exists two constants a and b with 0 ≤ a ≤ 2−m ≤ b ≤ 1, such that, p(x) = a if 2−m ≤ p(x) ≤ 1, p(x) = b if 0 ≤ p(x) ≤ 2−m . Assume that there are t assignments of x with p(x) = a, then there are 2m − t assignments of x with p(x) = b. Hence, the problem is converted to the one over a, b, t, i.e., min ta log a,b,t
1 1 + (2m − t)b log , a b
subject to 0 ≤ t ≤ 2m ; ta + (2m − t)b = 1;
(1)
t(2−m − a) + (2m − t)(b − 2−m ) = 2ǫ.
(2)
From Equ. (1) and (2), we get ǫ ǫ . a = 2−m − , b = 2−m + m t 2 −t
5
So the question is finding the optimal t that minimizes ǫ ǫ −t(2−m − ) log2 (2−m − ) t t ǫ ǫ −(2m − t)(2−m + m ) log2 (2−m + m ), 2 −t 2 −t subject to ǫ 0 ≤ t ≤ −m . 2 ǫ . In this case, the entropy The optimal solution is t∗ = 2−m of X is 1 H(X) = log(2m − t) = m − log2 , 1−ǫ which is the lower bound. This completes the proof. In the following lemma, we show that for any extractor, its efficiency is upper bounded by 1. The reason is that the amount of information, i.e., entropy, does not increase during the process of randomness extraction. Lemma 3. For any extractor with seed length d and output length m, if d = o(m), its efficiency η ≤ 1. Proof: We consider fixed-length extractors as a special case of variable-length extractors, and consider seedless extractors as a special case of seeded extractors when d = 0. So our proof only focus on seeded variable-length extractors. A main observation is that for any extractor, the entropy of its output sequence is bounded by the entropy of the input sequence plus the entropy of the seed, since the process of extracting randomness cannot create new randomness. For the output sequence, denoted by Y , it is ǫ-close to the uniform distribution Um . According to Lemma 2, its entropy is 1 . HR (Y ) ≥ m − log2 1−ǫ The total entropy of the inputs is HR (Xm ) + d. Hence, HR (Y ) ≤ HR (Xm ) + d. As a result, the efficiency of the extractor is η = lim
m→∞
m HR (Y ) = lim ≤ 1. HR (Xm ) m→∞ HR (Xm ) + d
This completes the proof. If R is a stationary ergodic process, we define its entropy rate as H(X l ) , h(R) = lim l→∞ l where X l is a random sequence of length l generated from the source R. In this case, the entropy of the input sequence on Sp is proportional to the expected input length. Lemma 4. Given a stationary ergodic source R, let Xm be the input sequence of a variable-length extractor that has an output length m. Then lim
m→∞
HR (Xm ) = h(R), ER [|Xm |]
where ER [|Xm |] is the expected input length.
Proof: Xm is a random sequence from Sp based on the distribution of R. Let l1 be the minimum length of the sequences in Sp , as m → ∞, l1 → ∞. Now, we define li = l1 + (i − 1) log l1 for all i ≥ 1. Based on them, we divide all the sequences in Sp into subsets Si = {x|x ∈ Sp , li ≤ |x| ≤ li+1 − 1} for i ≥ 1. Let pi = PR (Xm ∈ Si ), then X X li−1 i pj )HR (Xlli−1 , |Xm | ≥ li )], [( HR (Xm ) ≥ +1 |X1 j>i
i
P
where l0 = 0, j>i pj is the probability that |Xm | ≥ li , and Xab is a sequence of Xm from the ath element to the bth element. Since Xm is generated from a stationary ergodic process, and li − li−1 → ∞ as m → ∞, we can get l
i−1 i HR (Xlli−1 , |Xm | ≥ li ) → (li − li−1 )h(R). +1 |X1
As a result, as l1 → ∞, we have XX pj )(li − li−1 )h(R) ( HR (Xm ) ≥ (1 − ǫ) j>i
i
= (1 − ǫ)
X
pi li h(R),
i
for an arbitrary ǫ > 0. Also considering the other direction, we can get that as l1 → ∞, X pi li+1 h(R) HR (Xm ) ≤ (1 + ǫ) =
(1 + ǫ)
i X
pi (li + log l1 )h(R),
i
for an arbitrary ǫ > 0. For the expected input length, i.e., ER [|Xm |], it is easy to show that X X X pi (li + log l1 ). pi li+1 = pi li ≤ ER [|Xm |] ≤ i
i
i
So as m → ∞, i.e., l1 → ∞, it yields P pi li h(R) HR (Xm ) iP lim = lim m→∞ ER [|Xm |] m→∞ i pi li = h(R).
This completes the proof.
B. Sources and Uncertainty Given a source R, if its distribution is known, we say that this source is a known stochastic process, and its uncertainty is 0. In this paper, we mainly focus on those imperfect processes whose distributions are slightly unpredictable due to many factors like the existence of external adversaries.
6
First, given two known stochastic processes R and M, we let d(R, M) be the difference between R and M. Here, we define d(R, M) as d(R, M) = lim sup max n n→∞ x∈{0,1}
PR (x) PM (x) log2 PM1(x)
log2
dp (R, M) = max x∈Sp
PR (x) PM (x) log2 PM1(x)
,
,
dp (R, M) = d(R, M) + ǫp for a very small constant ǫp . As m → ∞, ǫp → 0. In this case, the upper bound of dp (R, M) or minM∈Gs.e. dp (R, M) is βp = β + ǫ p . Example 3. Let x1 x2 ... ∈ {0, 1}∗ be a sequence generated from an independent source R such that ∀i ≥ 1, P [xi = 1] ∈ [0.8, 0.82]. If we let M be a biased coin with probability 0.8132, then β = max d(R, M) possible R
=
,
0.82 log2 0.8132 1 log2 0.8132
Lemma 5. If d(R, M) → 0, we have PR (x) → PM (x) for all x ∈ {0, 1}∗. C. Efficiency and Uncertainty In this subsection, we investigate the relation between the efficiency and uncertainty. We show that given a stochastic process R with uncertainty β, as described in the previous subsection, one cannot construct a variable-length extractor with efficiency strictly larger than 1−β for all the possibilities of R. Let us first consider a simple example: let X be a random sequence with the uniform distribution on {0, 1}n and let Y be an arbitrary random sequence on {0, 1}n such that P [Y =x] P [X=x] 1 log2 P [X=x]
log2
) = 0.0405.
≤ β, ∀x ∈ {0, 1}n.
Now, we show that from the source Y , one cannot construct an extractor with efficiency strictly larger than 1 − β. To see this, we consider an extractor f with output length m, and a source Y with P [Y = y] ∈ {0, 2−n(1−β)}, ∀y ∈ {0, 1}n. For this a source Y , its entropy is H(Y ) = n(1 − β). In order to make sure the output sequence of f , denoted by Z, is ǫ-close to Um , it has H(Z) + log2 m ≤ lim m→∞ m→∞ n(1 − β) H(Y ) lim
As the number of required random bits m increases, dp (R, M) quickly converge to d(R, M). And we can write
log 0.2 max( 2 0.1868 1 log2 0.1868
PR (x) ≤ PM (x)1−β for all x ∈ {0, 1}∞ with |x| → ∞. This is a condition that is very easy to be satisfied by many natural stochastic processes for a small β.
where PR (x) is the probability of generating x from R when the sequence length is |x|, and PM (x) is the probability of generating x from M when the sequence length is |x|. Although there are some existing ways such as normalized Kullback-Leibler divergence to measure the difference between two sources, with them it is not easy to estimate the uncertainty of a source and it is not easy to analyze the performances of constructed variable-length extractors. In the rest of this paper, we investigate a few models of unpredictable sources. Most natural source can be well described in those ways. 1) The source R is an arbitrary stochastic process such that d(R, M) ≤ β for a constant β ∈ [0, 1] and a known stochastic process M. 2) R is an arbitrary stochastic process such that there exists a stationary ergodic process M (whose distribution is unknown) and d(R, M) ≤ β; that is, minM∈Gs.e. d(R, M) ≤ β, where Gs.e. denotes the set consisting of all stationary ergodic processes. In both the models, we call β as the uncertainty of the source R. In the real world, β can be easily estimated without knowing the distribution of the processes. It just reflects how unpredictable the real source R is. To construct variable-length extractors, we only care about the possible input sequences, namely, those in Sp . Hence, for the case of finite length, dp (R, M) is a more important parameter for us, defined by log2
According to our definition, d(M, R) ≤ β if and only if
1 1−ǫ
≤ 1.
So we cannot generate more than n(1 − β) random bits asymptotically. In this case, if we apply the seeded extractor f to the random sequence X, which is a possibility of Y , then the efficiency is m m = lim ≤ 1 − β. η = lim m→∞ n m→∞ H(X) So there does not exist a seeded extractor that can extract randomness from an arbitrary Y and its efficiency is strictly larger than 1 − β. Here, β is the uncertainty of the source. Theorem 6. Let M be a known stochastic process, and R be an arbitrary stochastic process such that d(R, M) ≤ β, then one cannot construct a variable-length extractor whose efficiency is strictly larger than 1 − β for all possible R. Proof: Let f be a variable-length extractor whose input sequence is a random sequence Xm on Sp and its output sequence is a random sequence Y on {0, 1}m. Assume that as m → ∞, f can extract from an arbitrary R such that the output sequence Y is ǫ-close to Um .
7
Let h = HM (Xm ) be the entropy of the input sequence based on the distribution of M, then we want to show that there exists a process R such that d(R, M) ≤ β and HR (Xm ) ≤ h(1 − β) as m → ∞. To find such a process R, we order all the elements in Sp as x1 , x2 , x3 , ... such that PM (x1 ) ≥ PM (x2 ) ≥ PM (x3 ) ≥ ... Then we divide all these elements into groups, {x1 , x2 , ..., xi1 }, {xi1 +1 , xi1 +2 , ..., xi2 }, ...
With the same proof, we can also get the following theorem.
such that the total probability of the elements in each group is almost the probability of its first element to the power of 1 − β, i.e., ij+1 1−β
0 ≤ PM (xij +1 )
X
−
PM (xk ) ≤ PM (xij +1 ),
k=ij +1
for all j ≥ 0, where i0 = 0. Let A = {x1 , xi1 +1 , xi2 +1 , ...} be the set consisting of the first elements of all the groups. Now, we consider a possibility of R in the following way: for all x ∈ {x1 , xi1 +1 , xi2 +1 , ...}, its probability is ij+1
X
PR (x) =
otherwise, the output sequence cannot be ǫ-close to the uniform distribution Um . If we apply the extractor f to the source M, which is also a possibility for R, then its efficiency is m ≤ 1 − β. η = lim m→∞ h So it is impossible to construct a variable-length extractor with efficiency strictly larger than 1−β for all the possibilities of the source R. This completes the proof.
PM (xk ), if x = xij +1 ;
Theorem 7. Let R be an arbitrary stochastic process such that d(R, M) ≤ β for a stationary ergodic process M with unknown distribution, , then one cannot construct a variablelength extractor whose efficiency is strictly larger than 1 − β for all possible R. The above theorems show that one cannot construct an extractor whose efficiency is strictly larger than 1 − β for all the possible source R. Here, β is an important parameter that measures the uncertainty of a real source R, either to a known process or to the nearest stationary ergodic process. In the next a few sections, we will present a few constructions for efficiently extracting randomness from the sources described in this section. We show that their efficiency η satisfies
k=ij +1
1 − β ≤ η ≤ 1.
For all x ∈ Sp /A = Sp /{x1 , xi1 +1 , xi2 +1 , ...}, its probability is PR (x) = 0. For this source R, the entropy of the input sequence is X 1 HR (Xm ) = PR (x) log2 . PR (x) x∈Sp
As m → ∞, we have
=
HR (Xm ) X PR (x) log2 x∈A
→ (1 − β)
X
1 PR (x)
PR (x) log2
x∈A
=
(1 − β)
j+1 X iX
1 PM (x)
PM (xk ) log2
1 PM (xij +1 )
PM (xk ) log2
1 PM (xk )
j≥0 k=ij +1
≤
(1 − β)
j+1 X iX
j≥0 k=ij +1
=
(1 − β)HM (Xm )
=
(1 − β)h.
According to Lemma 2, as m → ∞, Furthermore, we can get lim
HR (Y ) ≤ 1, HR (Xm )
lim
m ≤ 1, (1 − β)h
m→∞
it implies that m→∞
m HR (Y )
That means the bound 1 − β is actually achievable and the constructions proposed in this paper are asymptotically optimal on efficiency. IV. C ONSTRUCTION I: A PPROXIMATED P ROCESSES
BY
K NOWN
In this section, we consider those sources which can be approximated by a known stochastic process M, namely, an arbitrary process R with d(R, M) ≤ β for a known process M. We say that a stochastic process M is known if its distribution is given, i.e., PM (x) can be easily calculated for any x ∈ {0, 1}∗. Note that this process M is not necessary to be stationary or ergodic. For instance, M can be an independent process z1 z2 ... ∈ {0, 1}∗ such that ∀i ≥ 1, PM (zi = 1) =
1 + sin(i/10) . 2
A. Construction
→ 1.
Our goal is to extract randomness from an imperfect random source R. The problem is that we do not know the exact distribution of R, but we know that it can be approximated by a known process M. So we can use the distribution of M to estimate the distribution of R. As a result, we have the following procedure to extract m almost-random bits. The idea of the procedure is first producing a random sequence of length n and min-entropy k = m(1 + α) with α > 0, from which we can further obtain a sequence ǫ-close to the uniform distribution Um by applying a (k, ǫ) seeded extractor. According to the results of seeded extractors, this constant α > 0 can be arbitrarily small.
8
Construction 1. Assume the real source R is an arbitrary stochastic process such that d(R, M) ≤ β for a known process M. Then we extract m almost-random bits from R based on the following procedure. 1) Read input bits one by one from R until we get an input sequence x ∈ {0, 1}∗ such that log2
1 k , ≥ PM (x) 1 − βp
where βp = β + ǫp with ǫp > 0 and k = m(1 + α) with α > 0. The small constant ǫp has value depending on the input set Sp ; as m → ∞, ǫp → 0. The constant α can be arbitrarily small. 2) Let n be the maximum length of all the possible input sequences, then n = arg min{l ∈ N|∀y ∈ {0, 1}l, l
1 k log2 }. ≥ PM (y) 1 − βp If |x| < n, we extend the length of x to n by adding n − |x| trivial zeros at the end. Since x is randomly generated, from the above procedure we get a random sequence Z of length n. And it can be proved that this random sequence has min-entropy k. 3) Applying a (k, ǫ) extractor to Z yields a binary sequence of length m that is ǫ-close to the uniform distribution Um .
for all x ∈ Sp . Since the second step, i.e., adding trivial zeros, does not reduce the min-entropy of Sp . As a result, we get a random sequence Z of length n and with min-entropy at least k. Since k = m(1 + α) with α > 0, according to Lemma 1, we can construct a seeded extractor that applies to the sequence Z and generates a binary sequence ǫ-close to the uniform distribution Um . This completes the proof. B. Efficiency Analysis Now, we study the efficiency of Construction 1. According to our definition, given a construction, its efficiency is m . η = lim m→∞ HR (Xm ) Theorem 9. Given a real source R and a known process M such that d(R, M) ≤ β, then the efficiency of Construction 1 is 1 − β ≤ η ≤ 1. Proof: Since η is always upper bounded by 1, we only need to show that η ≥ 1 − β. According to Lemma 1, as m → ∞, we have k = 1. m Now, let us consider the number of elements in Sp , i.e., |Sp |. To calculate |Sp |, we let lim
m→∞
The following example is provided for comparing this construction with fixed-length constructions.
Sp′ = {x[1 : |x| − 1]|x ∈ Sp },
Example 4. Let M be a biased coin with probability 0.8 (of k being 1). If 1−β = 2, then we can get the input set p
where x[1 : |x| − 1] is the prefix of x of length |x| − 1, then for all y ∈ Sp′ ,
Sp = {0, 10, 110, 1110, 11110, 111110, 1111110, 1111111}. In this case, the expected input length is strictly smaller than 7. For fixed-length constructions, to get a random sequence with min-entropy at least 2, we have to read 7 input bits independent of the context. It is less efficient than the former method.
log2 Hence,
log2 |Sp′ | ≤
PR (x) PM (x) log2 PM1(x)
log2
≤ βp .
Based on the construction, for all x ∈ Sp log2
1 k . ≥ PM (x) 1 − βp
The two inequalities above yield that log2
1 ≥ k, PR (x)
k . 1 − βp
It is easy to see that |Sp | ≤ 2|Sp′ |, so
Theorem 8. Construction 1 generates a random sequence of length m that is ǫ-close to Um . Proof: We only need to prove that given a source R and a model M with dp (R, M) ≤ βp , it yields a random sequence Z with min-entropy at least k. According to the definition of dp (R, M), for all x ∈ Sp ,
1 k . ≤ PM (y) 1 − βp
log2 |Sp | ≤
k + 1. 1 − βp
Let Xm be the input sequence, then HR (Xm ) log2 |Sp | ≤ lim k→∞ k k 1 1 = . ≤ lim k→∞ 1 − βp 1−β
lim
k→∞
Finally, it yields η = lim
m→∞
m ≥ 1 − β. HR (Xm )
This completes the proof. We see that the efficiency of the above construction is between 1 − β and 1. As shown in Theorem 6, the gap β, introduced by the uncertainty of the real source R, cannot be smaller. Our construction is asymptotically optimal in the
9
sense that we cannot find a variable-length extractor with efficiency definitely larger than 1 − β. Corollary 10. Given a real source R and a known process M such that d(R, M) ≤ β, then as β → 0, the efficiency of Construction 1 is η → 1. In this case, the efficiency of the construction can achieve Shannon’s limit. If R is a stationary ergodic process, we can also get the following result. Corollary 11. Given a stationary ergodic processR and a known process M such that d(R, M) ≤ β, for the expected input length of Construction 1, we have 1 1 E[|Xm |] ≤ lim ≤ , h(R) m→∞ m (1 − β)h(R) where h(R) is the entropy rate of the source R. Proof: This conclusion is immediate following Lemma 4 and Theorem 9.
where k0 is the number of zeros in x, k1 is the number of ones in x, βp = β +ǫp with ǫp > 0 and k = m(1+α) with α > 0. The small constant ǫp has value depending on the input set Sp ; as m → ∞, ǫp → 0. The constant α can be arbitrarily small. 2) Since the input sequence x can be very long, we map it into a sequence z of fixed length n such that z = [I(k0 ≥k1 ) , min(k0 , k1 ), r(x)], where I(k0 ≥k1 ) = 1 if and only if k0 ≥ k1 , and r(x) is the rank of x among all the permutations of x with respect to the lexicographic order. Since x is randomly generated, the above procedure leads us to a random sequence Z of length n. 3) Applying a (k, ǫ) extractor to Z yields a random sequence of length m that is ǫ-close to Um . To see that the construction above works, we need to show that the random sequence Z obtained after the second step has min-entropy at least k, and its length n is well bounded. Lemma 12. Given a source R with minM∈Gb.c. d(R, M) ≤ β, Construction 2 yields a random sequence Z with length
V. C ONSTRUCTION II: A PPROXIMATELY B IASED C OINS In this section, we use a general ideal model such as a biased coin or a Markov chain to approximate the real source R. Here, we do not care about the specific parameters of the ideal model. The reason is, in some cases, the source R is very close to an ideal source but we cannot (or do not want to) estimate the parameters accurately. As a result, we introduce a construction by exploring the characters of biased coins or Markov chains. For simplicity, we only discuss the case that the ideal model is a biased coin, and the same idea can be generalized when the ideal model is a Markov chain. Specifically, let Gb.c. denote the set consisting of all the models of biased coins with different probabilities, and we consider R as an arbitrary stochastic process such that
n ≤ 1 + ⌈log2 (
Proof: 1) I(k0 ≥k1 ) can be represented as 1 bit. 2) Without loss of generality, we assume k0 ≤ k1 . According to our construction, k0 + k1 − 1 k for k0 > 1, log2 < 1 − βp k0 − 1 and
k k1 < log2 for k0 = 0 or k0 = 1. 1 1 − βp
Then k0 − 1 ≤
min d(R, M) ≤ β.
≤
M∈Gb.c.
0, from which we can further obtain a sequence ǫ-close to the uniform distribution Um by applying a (k, ǫ) seeded extractor. Construction 2. Assume the real source R is an arbitrary stochastic process such that min d(R, M) ≤ β
M∈Gb.c.
for a constant β. Then we extract m almost-random bits from R based on the following procedure. 1) Read input bits one by one from R until we get an input sequence x ∈ {0, 1}∗ such that k k0 + k1 ≥ log2 , max(1, min(k0 , k1 )) 1 − βp
2k k + 1)⌉ + ⌈ ⌉. 1 − βp 1 − βp
2k0 − 1 log2 k −1 0 k0 + k1 − 1 log2 k0 − 1 k . 1 − βp
k +1)⌉ bits. So min(k0 , k1 ) can be represented as ⌈log2 ( 1−β p 3) Let us consider the number of permutations of x, denoted by N (x). If k0 > 1, then k0 + k1 log2 N (x) = log2 k0 k0 + k1 k0 + k1 − 1 ≤ log2 + log2 k0 k0 − 1 k0 + k1 k + log2 . ≤ 1 − βp k0
If k0 = 1, then log2 N (x) ≤ log2 If k0 = 0, then
k1 + 1 k1 + log2 . 1 k1
log2 N (x) = 0.
10
Based on the analysis above, we can get log2 N (x) ≤
then the efficiency of Construction 2 is
2k . 1 − βp
1 − β ≤ η ≤ 1.
2k ⌉ bits. Hence, r(x) can be represented as ⌈ 1−β p This completes the proof.
Let 1a denote the all-one vector of length a, then we get the following result. Theorem 13. Construction 2 generates a random sequence of length m that is ǫ-close to Um if PR (1a ) ≤ 2−k , PR (0a ) ≤ ⌊ k ⌋ 2−k for a = 2 1−βp . Proof: Since the mapping in the second step is injective, it will not affect the min-entropy; we only need to prove that the input sequence has min-entropy k, i.e., 1 log2 ≥ k, ∀x ∈ Sp , PR (x) where Sp is the set consisting of all the possible input sequences. It is not hard to see that if min(k0 , k1 ) ≥ 1, 1 PM (x) ≤ k0 +k1 , k0
which leads to
log2
1 k . ≥ PM (x) 1 − βp
Furthermore, based on the definition of dp (R, M), we can get if min(k0 , k1 ) ≥ 1, 1 log2 ≥ k. PR (x) If min(k0 , k1 ) = 0, according to the condition in the lemma, we can also have the same result. Since k = m(1 + α) with α > 0, according to Lemma 1, we can construct a seeded extractor that applies to the sequence Z and generates a binary sequence ǫ-close to the uniform distribution Um . This completes the proof. Actually, the idea above can be easily generalized if M is a Markov chain that best approximates the real source R. The idea follows the main lemma in [29] that shows how to generate random bits with optimal efficiency from an arbitrary Markov chain. B. Efficiency Analysis For the efficiency of the construction, we can get the same bounds as Construction 1. Theorem 14. Given an arbitrary source R such that
Proof: Let Nk0 ,k1 denote the number of input sequences with k0 zeros and k1 ones in Sp , and let pk0 ,k1 be the probability based on R of generating such a sequence. Let us define A = {(k0 , k1 )|Nk0 ,k1 > 0}, then we can get HR (Xm ) ≤ H({pk0 ,k1 |(k0 , k1 ) ∈ A}) X pk0 ,k1 log2 Nk0 ,k1 . + (k0 ,k1 )∈A
According to the proof in the above theorem, min(k0 , k1 ) ≤ k + 1. So there are totally at most 2( 1−β + 1) available p pairs of (k0 , k1 ). Hence k 1−βp
H({pk0 ,k1 |(k0 , k1 ) ∈ A}) ≤ log2 (2 + (
Now, we write n = k0 + k1 . According to our method, if min(k0 , k1 ) ≥ 1, k k0 + k1 ≥ 2 1−βp , min(k0 , k1 ) k k0 + k1 − 1 < 2 1−βp . min(k0 , k1 ) − 1 Hence, given n, we get an upper bound for min(k0 , k1 ), which is k n−1 (3) tn = max{i ∈ {0, 1, ..., n}| < 2 1−βp }. i−1 k Note that if n−1 ≥ 2 1−βp , then tn is a nondecreasing n 2 −1 function of n. Using the Stirling bounds on factorials yields 1 n lim log2 = H(ρ), n→∞ n ρn where H is the binary entropy function. Hence, following (3), we can get lim H(
n→∞
min d(R, M) ≤ β, 1 2
of
tn k ) = lim . n→∞ (1 − βp )n n
(4)
Let Pn denote the probability of having an input sequence of length at least n based on the distribution of R. In this case, Pn is a nonincreasing function of n. Let Qn denote the probability of having an input sequence of length at least n based on the distribution of M ∈ Gb.c. whose probability is p ≤ 12 . Since for all binary sequence x ∈ {0, 1}n,
M∈Gb.c.
if there exists a model M ∈ Gb.c. with probability p ≤ being 1 or 0 and s 1 ln 2 , p > d(R, M) log2 p 2
k + 1)) = o(k). 1 − βp
log2
1 1 ≤ n log2 , PM (x) p
log2
PR(x) 1 ≤ dn log2 , PM (x) p
we can get
where d = dp (R, M).
11
P P Since Pn = x∈S PM (x) for x∈S PR (x) and Qn = some S ⊂ {0, 1}n, it is not hard to prove that Pn 1 log2 ≤ dn log2 . Qn p
(5)
According to Hoeffding’s inequality, we can get Qn
≤ 2P [k1 ≤ tn ] k1 tn ≤ 2P [ − p ≤ − p] n n tn 2 ≤ 2e−2n(p− n ) . tn 2
(6)
From this inequality, we see that Pn → 0 as n → 0 if tn − d log2 p ln 2 − 2(p − )2 < 0. (7) n Based on (4) and (7), we can get that Pn → 0 as n → 0 if
Now, let a =
1+ǫ q (1−βp )H(p− d log2
1 ln 2 p 2 )
.
with ǫ > 0, we can
write HR (Xm ) ≤
o(k) +
X
pk0 ,k1 log2 Nk0 ,k1
k0 ,k1 :k0 +k1 ≥ak
X
+
pk0 ,k1 log2 Nk0 ,k1 .
k0 ,k1 :k0 +k1 d(R, M) log2 , p 2
1 2
of
then for the expected input length of Construction 2, we have 1 E[|Xm |] 1 ≤ lim ≤ , m→∞ h(R) m (1 − β)h(R) where h(R) is the entropy rate of R. VI. C ONSTRUCTION III: A PPROXIMATELY S TATIONARY E RGODIC P ROCESSES In this section, we consider imperfect sources that are approximately stationary and ergodic. Here, we let R be an arbitrary stochastic process such that d(R, M) ≤ β for a stationary ergodic process M. For these sources, universal data compression can be used to ‘purify’ input sequences, i.e., shortening their lengths while maintaining their entropies. In [27], Visweswariah, Kulkarni and Verd´u showed that optimal variable-length source codes asymptotically achieve optimal variable-length random bits generation in the sense of normalized divergence. Although their work only focused on ideal stationary ergodic processes and generates ‘weaker’ random bits, it motivates us to combine universal compression with fixed-length extractors for efficiently generating random bits from noisy stochastic processes. In this section, we will first introduce Lempel-Ziv code and then present its application in constructing variable-length extractors. A. Construction Lempel-Ziv code is a universal data compression scheme introduced by Ziv and Lempel [31], which is simple to implement and can achieve the asymptotically optimal rate for stationary ergodic sources. The idea of Lempel-Ziv code is to parse the source sequence into strings that have not appeared so far, as demonstrated by the following example. Example 5. Assume the input is 010111001110000..., then we parse it as strings 0, 1, 01, 11, 00, 111, 000, ... where each string is the shortest string that never appear before. That means all its prefixes have occurred earlier. Let c(n) be the number of strings obtained by parsing a sequence of length n. For each string, we describe its location with log c(n) bits. Given a string of length l, it can described
12
by (1) the location of its prefix of length l − 1, and (2) its last bit. Hence, the code for the above sequence is
Theorem 17. When m → ∞, Construction 3 generates a random sequence of length m that is ǫ-close to Um .
Proof: Let x be an input sequence. According to theorem (000, 0), (000, 1), (001, 1), (010, 1), (001, 0), (100, 1), (101, 0), ... 12.10.1 in [5], for the stationary ergodic process M, we can where the first number in each pair indicates the prefix get location and the second number is the last bit of the string. 1 c c 1 log2 ≥ log2 c − H(U, V ), |x| PM (x) |x| |x| Typically, Lempel-Ziv is applied to an input sequence of fixed length. Here, we are interested in Lempel-Ziv code with where c H(U, V ) → 0 as |x| → 0. fixed output length and variable input length. As a result, |x| we can apply a single fixed-length extractor to the output of As a result, if k = Θ(n), Lempel-Ziv code for extracting randomness. In our algorithm, we read raw bits one by one from an imperfect source until the 1 1 1 1 ≥ lim (1 − βp ) log2 lim log2 length of the output of a Lempel-Ziv code reaches a certain k→∞ k→∞ k PR (x) k PM (x) length. In another word, the number of strings after parsing (1 − βp )c log2 c ≥ lim is a predetermined number c. For example, if the source is k→∞ k 1011010100010... and c = 4, then after reading 6 bits, we can (1 − βp )n = lim parse them into 1, 0, 11, 01. Now, we get an output sequence k→∞ k (000, 1), (000, 0), (001, 1), (010, 1), which can be used as the = lim 1 + ε input of a fixed-length extractor. We call this Lempel-Ziv code k→∞ = 1. as a variable-length Lempel-Ziv code. Let Z be a random sequence obtained based on variableFinally, we can get that length Lempel-Ziv code such that its length is Hmin (Z) Hmin (Xm ) lim = lim ≥ 1. |Z| = (log c + 1)c, k→∞ k→∞ k k for a predetermined c. Then Z is very close to truly random This implies that as m → ∞, i.e., k → ∞, the min-entropy bits in the term of min-entropy if the source R is stationary of Z is at least k. ergodic. As a result, we have the following construction for Since k = m(1+α) for an α > 0, we can continue to apply variable-length extractors. a (k, ǫ) extractor to extract m almost-random bits from Z. Construction 3. Assume the real source is R and there exists a stationary ergodic process M such that d(R, M ) ≤ β. Then we extract m almost random bits from R based on the following procedure. 1) Read input bits one by one based on the variable-length Lempel-Ziv code until we get an output sequence Z whose length reaches n=
k (1 + ε), 1 − βp
where ε > 0 is a small constant indicating the performance gap between the case of finite-length and that of infinite-length for Lempel-Ziv code; as m → ∞, we have ε → 0. Similar as above, βp = β + ǫp with ǫp > 0 and k = m(1 + α) with α > 0. The small constant ǫp has value depending on the input set Sp ; as m → ∞, ǫp → 0. The constant α can be arbitrarily small. Then we get a random sequence Z of length n and with minentropy k. 2) Applying a (k, ǫ) extractor to Z yields a random sequence of length m that is ǫ-close to Um . We show that the min-entropy of Z is at least k as m → ∞. If m is not very large, by adjusting the parameter ε, we can make the min-entropy of Z be at least k. So we can continue to apply an efficient fixed-length extractor to ‘purify’ the resulting sequence. Finally, we can get m random bits that satisfy our requirements on quality in the sense of statistical distance.
B. Efficiency Analysis Now, we study the efficiency of the construction based on variable-length Lempel-Ziv codes. Theorem 18. Given a real source R such that there exists a stationary ergodic process M with d(R, M) ≤ β, then the efficiency of Construction 3 is 1 − β ≤ η ≤ 1. Proof: Similar as above, we only need to prove that η ≥ 1 − β. Since there are at most n = 2c(log2 c+1) distinct input sequences, their entropy HR (Xm ) ≤ c(log2 c + 1) = n. According to the proof in Theorem 17, we have that the random sequence Z has min-entropy at least k, and it satisfies lim
m→∞
1 n = . k 1−β
Based on the construction of seeded extractors, we can also get m lim = 1. m→∞ k As a result, m η = lim ≥ 1 − β. m→∞ HR (Xm )
13
This completes the proof. Although Construction 3 has the same efficiency as the other constructions, when m is not large, it is less efficient than the other constructions because the Lempel-Ziv code does not always have the best performance when the input sequence is not long. But its advantage is that it can manage more general sources without accurate estimations. In the above theorem, the gap β represents how far the source R is from being stationary ergodic. In general, the efficiency loss introduced by the uncertainty of sources is a part that cannot be avoid. Corollary 19. Given a real source R such that there exists a stationary ergodic model M with d(R, M) ≤ β, then as β → 0, the efficiency of Construction 3 is η → 1. It shows that as β → 0, Construction 3 reaches the Shannon’s limit on efficiency. Corollary 20. Given a stationary ergodic source R (assume we do not know that it is stationary ergodic), for the expected input length of Construction 3, we have 1 E[|Xm |] 1 ≤ lim ≤ , h(R) m→∞ m (1 − β)h(R) where h(R) is the entropy rate of R. VII. S EEDLESS C ONSTRUCTIONS To simulate seeded constructions of variable-length extractors in randomized applications, we have to enumerate all possible assignments of the seed, hence, the computational complexity will be increased significantly. In real applications, we prefer seedless constructions rather than seeded constructions. It motivates us to study the seedless constructions of variable-length extractors in this section.
From which, we can get the efficiency of an optimal fixedlength extractor, which is ηf ixed ∈ [0.2901, 0.3117], i.e., about only 0.3 of the input entropy is used for generating random bits, which is far from optimal In the above example, we let M be a biased coin model with probability p = 0.9072. In this case, β ≤ d(R, M) = 0.0315. According to the constructions in the previous sections, there exists seeded variable-length extractors such that their efficiencies are ηvariable ∈ [1 − β, 1] ⊆ [0.9685, 1], which are near Shannon’s limit. Based on the fact that the source is independent, we can eliminate the requirement of truly random bits as the seed, hence, we have seedless variable-length extractors. To construct a seedless variable-length extractor, we first apply a seedless fixed-length extractor E1 (which may not be very efficient) to extract a random sequence of length d from input bits. Using this random sequence as the seed, we continue to apply a seeded variable-length extractors E2 to extract m almost-random bits from extra input bits. So seedless variablelength extractors can be constructed as cascades of seedless fixed-length extractors and seeded variable-length extractors. Since the input length of E1 is much shorter (it is ignorable) than the input length of E2 , theNefficiency of the resulting E1 , is dominated by the seedless extractor, i.e., E = E2 efficiency of E2 . So the efficiency of the seedless extractor E is in [0.9685, 1], which is very close to the optimality. This example demonstrates a simple construction of seedless variable-length extractors for independent sources, and it shows the significant performance gain of variable-length extractors compared to fixed-length extractors.
A. An Independent Source Let us first consider a simple independent source described in the introduction. This type of sources have been widely studied in seedless constructions of fixed-length extractors. Example 6. Let x1 x2 ... ∈ {0, 1}∗ be an independent sequence generated from a source R such that P [xi = 1] ∈ [0.9, 0.91] ∀i ≥ i.
We see that the existing methods for generating random bits from ideal sources (like biased coins or Markov chains) cannot be applied here, since the probability of each bit is slightly unpredictable. Some seedless extractors have been developed for extracting randomness from such sources. In particular, there exists seedless extractors which are able to extract as many as Hmin (X) random bits from a independent random sequence X asymptotically. In order to extract m random bits in the above example, it needs to read log m 1 input bits as 2 0.91 m → ∞. In this case, the entropy of the input sequence is in m m [H(0.9) 1 , H(0.91) 1 ]. log2 0.91 log2 0.91
B. Generalized Sources Here we consider a generalization of independent processes. Given a system, we use λi denote the complete system status at time i. For example, in a system that generates thermal noise, the system status can include the value of the noise signal, the temperature, the environmental effects, etc. Usually, the evolution of such a system has a Markov property, namely, P [λi+1 , λi+2 , ...|λi , λi−1 , ..., λ1 ] = P [λi+1 , λi+2 , ...|λi ], for all i ≥ 1. Let X = x1 x2 ... ∈ {0, 1}n be the binary sequence generated from this system, then for any 1 < k < n−1, n n P [X1k−1 , Xk+1 |λk ] = P [X1k−1 |λk ]P [Xk+1 |λk ],
(8)
where Xab = xa xa+1 ...xb . In some sense, the source X that we consider is a hidden Markov process, but the number of hidden states can be infinite (λi can be discrete or continuous). Example 7. One example of the above sources is the one studied in [12], called a space s source. A space s source is basically a source generated by a width 2s branching program. At each step, the state of the process generating the source is
14
in one of 2s states, and the bit generated is a function of the current state. Unlike perfect Markov chains, the transition probabilities can be different at each step. In this example, the system status λi is the content of space s at time i, that is, one of the 2s states, and xi ∈ {0, 1} is a function of λi . Space s sources are very general in that most other classes of sources that have been considered previously can be computed with a small amount of space [12]. The model that we consider, as described by (8), is a natural generalization of space s sources. This model has a very nice feature: from such a source, we can get a group of sequences conditionally independent of each other. Namely, given system statues at some time points [λ(1) , λ(2) , ..., λ(γ) ] = [λa , λ2a , ..., λγa ], the subsequences [X (1) , X (2) , ..., X (γ) , X (γ+1) ] γa−1 2a−1 ∞ = [X1a−1 , Xa+1 , Xγa+1 , ..., X(γ−1)a+1 ]
Let Xm be the input sequence of E2 that read from X (γ+1) , then given λ(γ) , we have kE2 (Xm , Ud ) − Um k ≤ ǫ2 . From the two inequalities above, given λ(1) , λ(2) , ..., λ(γ) , we have kE2 (Xm , D) − Um k ≤ ǫ1 + ǫ2 . Since it is true for any assignments of λ(1) , λ(2) , ..., λ(γ) , we can get =
kE2 (Xm , D) − Um k X P [λ(1) , λ(2) , ..., λ(γ) ](ǫ1 + ǫ2 ) λ(1) ,λ(2) ,...,λ(γ)
≤
ǫ1 + ǫ2 .
Hence, the m almost-random bits extracted by E is also (ǫ1 + ǫ2 )-close to Um . In the following theorem, we show that the seedless variable-length extractor E has the efficiency as the seeded variable-length extractor E2 .
are conditionally independent of each other. Based on this condition, we have the following seedless construction of variable-length extractors.
Theorem 22. In Construction 4, suppose that
Construction 4. Given a source R described by (8), we can construct a seedless variable-length extractor E in the following way: 1) Suppose that
Let ηE denote the efficiency of the resulting seedless variablelength extractor E, and let ηE2 denote the efficiency of the E2 , then ηE = ηE2 .
Hmin (X (i) |λ(i) , λ(i+1) ) ≥ kd , ∀0 ≤ i ≤ γ. We construct a γ-source extractor [19] E1 : [{0, 1}a−1]γ → {0, 1}d such that if each source has minentropy kd , it can extract d almost-random bits which are ǫ1 -close to the uniform distribution on {0, 1}d. 2) We construct a seeded variable-length extractor E2 : Sp ×{0, 1}d → {0, 1}m such that with condition on λ(γ) , it can extract m almost-random bits from X (γ+1) and these m almost-random bits are ǫ2 -close to the uniform distribution on {0, 1}m if the seed is truly random. 3) The seedless variable-length extractor E is a cascade of E1 and E2 : Let D = E1 (X (1) , X (2) , ..., X (γ) ), then we apply D as the seed of E2 to generate m almostrandom bits from X (γ+1) ; that is, E(X) = E2 (X (γ+1) , E1 (X (1) , X (2) , ..., X (γ) )). For this construction, we have the following theorems. Theorem 21. In Construction 4, the m almost-random bits generated by the seedless variable-length extractor E are (ǫ1 + ǫ2 )-close to the uniform distribution on {0, 1}m. Proof: According to the construction, we can let the parameter a = |X (i) | + 1 with 1 ≤ i ≤ γ be large enough, so given λ(1) , λ(2) , ..., λ(γ) , kD − Ud k ≤ ǫ1 .
Hmin (X (i) |λ(i) , λ(i+1) ) = Θ(|X (i) |), ∀0 ≤ i ≤ γ.
Proof: According to the construction of E1 , we can get that d = Θ(a), where a = |X (i) | + 1 for 1 ≤ i ≤ γ. If ǫ2 is a constant, then d = O(log m) = o(m). As a result,
aγ = 0. m Let H denote the entropy of the input sequence of E2 , then ηE2 = limm→∞ m H , and m m ≤ ηE ≤ lim . lim m→∞ H m→∞ aγ + H Hence, ηE = ηE2 . lim
m→∞
The theorem above shows that the efficiency of seedless variable-length extractors can be very close to optimality. For many sources, such as biased coins with noise, or Markov chains with noise, the existing algorithms for ideal sources (e.g., perfect biased coins or perfect Markov chains) cannot generate high-quality random bits from them. At the same time, the traditional approaches of fixed-length extractors are not very efficient. The gap between their efficiency and the optimality is determined by the bias of the source. Seedless variable-length extractors take the advantages of both, as a result, they can approach the information-theoretic upper bound on efficiency while being capable to combat noise in the sources.
15
VIII. C ONCLUSION
AND
D ISCUSSION
In this paper, we introduced the concept of the variablelength extractors, namely, those extractors with variable input length and fixed output length. Variable-length extractors are generalizations of the existing algorithms for ideal sources to manage general stochastic processes. They are also improvements of traditional fixed-length extractors to fill the gap between min-entropy and entropy of the source on efficiency. The key idea of constructing variable-length extractors is to approximate the source using a simple model, which is a known process, a biased coin, or a stationary ergodic process. Depending on the model selected, we proposed and analyzed three seeded constructions of variable-length extractors. Their efficiency is lower bounded by 1 − β and upper bounded by 1 (optimality), where β(0 ≤ β ≤ 1) indicate the uncertainty of the real source. We also show that our constructions are asymptotically optimal, in the sense that one cannot find a construction whose efficiency is always strictly larger than 1 − β. In addition, we demonstrated how to construct seedless variable-length extractors by cascading seeded variable-length extractors with seedless fixed-length extractors. They can work for many (but not all) natural sources such as those based on noise signals. There are certain connections between variable-length extractors and a whole family of variable-to-fixed length source codes [14], [15], [21], [23], [24], [26], [28]. With a variable-tofixed length code, an infinite sequence is parsed into variablelength phases, chosen from some finite set D of phases. Each phase is then coded into a binary sequence of fixed length m. The set D of phases is complete, i.e., every infinite sequence has a prefix in D. The key of constructing a good variableto-fixed length source code is to find the best set D that consists of at last 2m prefix-free phases and maximizes the expected phase length. As comparison, the key of constructing a variable-length extractor is to find the best input set Sp that consists of sequences with probability at most 2−k for each and minimizes the expected sequence length. Although their goals are different, some common ideas can be used to construct both the phase set D and the input set Sp . For example, in [28], Visweswariah et al. defined the phase set D by x∗ ∈ D if P (x∗ ) ≤ c and no prefix of x∗ satisfies this property. The same idea is applied in our construction I. In [14], [24], the phase set D is determined by the number of ones and zeros in the phase, so is our construction II. In some sense, an optimal variable-to-fixed length code can result in a fixed-length binary sequence whose min-entropy is close to its length. However, variable-to-fixed length source codes do not always work well in constructing variable-length extractors, because (1) the designing criteria are different and they may degrade the performance; (2) variable-to-fixed length source codes take both encoding and decoding in consideration, hence, they are more complex in computation than what we require (decoding is not necessary) for constructing variablelength extractors; and (3) the sources that we considered for variable-length extractors are unpredictable, which are more general than the ones considered in variable-to-fixed length source codes.
R EFERENCES [1] J. Abrahams, “Generation of discrete distributions from biased coins,” IEEE Trans. Inform. Theory, vol. 42, pp. 1541–1546, 1996. [2] B. Barak, R. Impagliazzo, and A. Wigderson, “Extracting randomness using few independent sources,” SIAM J. Comput., 36:1095–1118, 2006. [3] J. Bourgain, “More on the sum-product phenomenon in prime fields and its applications,” International Journal of Number Theory, 1:1–32, 2005. [4] A. Cohen and A. Wigderson, “Dispersers, deterministic amplification, and weak random sources,” in Proc. Annual IEEE Symposium on Foundations of Computer Science (FOCS), 1989. [5] T. M. Cover, J. A. Thomas, Elements of Information Theory, Second Edition, Wiley, July 2006. [6] Y. Dodis, “Impossibility of black-box reduction from non-adaptively to adaptively secure coin-flipping,” Technical Report 039, Electronic Colloquium on Computational Complexity, 2000. [7] Z. Dvir and A. Wigderson, “Kakeya sets, new mergers and older extractors,” in Proc. IEEE Symposium on Foundations of Computer Science (FOCS), 2008. [8] A. Gabizon, R. Raz, and R. Shaltiel, “Deterministic extractors for bitfixing sources by obtaining an independent seed,” SIAM J. Comput., 36:1072–1094, 2006. [9] V. Guruswami, C. Umans, and S. Vadhan, “Unbalanced expanders and randomness extractors from Parvaresh-Vardy codes,” in Proc. IEEE Conference on Computational Complexity (CCC), pp. 96–108, 2007. [10] T. S. Han and M. Hoshi, “Interval algorithm for random number generation,” IEEE Trans. Inform. Theory, vol. 43, No. 2, pp. 599–611, 1997. [11] J. Kamp and D. Zuckerman, “Deterministic extractors for bitfixing sources and exposure-resilient cryptography,” SIAM J. Comput., 36:1231–1247, 2006. [12] J. Kamp, A. Rao, S. Vadhan and D. Zuckerman, “Deterministic extractors for small-space sources,” Journal of Computer and System Sciences, vol. 77, pp. 191–220, 2011. [13] D. E. Knuth and A. Yao, “The complexity of nonuniform random number generation,” Algorithms and Complexity: New Directions and Recent Results, pp. 357–428, 1976. [14] J. C. Lawrence, “A new universal coding scheme for the binary memoryless source,” IEEE Trans. Inform. Theory, vol. 23, pp. 466–472, 1977. [15] N. Merhav and D. L. Neuhoff, “Variable-to-fixed length codes provide better large deviations performance than fixed-to-variable length codes,” IEEE Trans. Inform. Theory, vol. 38, pp. 135–140, Jan. 1992. [16] N. Nisan, “Extracting randomness: How and why. A survey,” in Proc. IEEE conference on Computational Complexity, pp. 44–58, 1996. [17] J. Radhakrishnan and A. Ta-Shma, “Bounds for dispersers, extractors, and depth-two supperconcentrators,” SIAM Journal on Discrete Mathmatics, 13(1): 2–24, 2000. [18] A. Rao, “Extractors for a constant number of polynomially small minentropy independent sources,” in Proc. ACM Symposium on Theory of Computing (STOC), 2006. [19] A. Rao, “Randomness extractors for independent sources and applications,” Ph.D. thesis, Department of Computer Science, University of Texas at Austin, 2007. [20] R. Raz, “Extractors with weak random seeds,” in Proc. Annual ACM Symposium on Theory of Computing (STOC), pp. 11–20, 2005. [21] S. A. Savari and R. G. Gallager, “Generalized Tunstall codes for sources with memory,” IEEE Trans. Inform. Theory, vol. 43, pp. 658–668, Mar. 1997. [22] R. Shaltiel, “Recent developments in explicit constructions of extractors,” in Current trends in theoretical computer science. The Challenge of the New Century, vol 1: Algorithms and Complexity, 2004. [23] T. J. Tjalkens and F. M. J. Willems, “Variable to fixed-length codes for Markov sources,” IEEE Trans. Inform. Theory, vol. 33, pp. 246–257, Mar. 1987. [24] T. J. Tjalkens and F. M. J. Willems, “A universal variable-to-fixed-length source codes based on Lawrence’s algorithm,” IEEE Trans. Inform. Theory, vol. 38, pp. 247–253, Mar. 1992. [25] L. Trevisan and S. P. Vadhan, “Extracting randomness from samplable distributions,” in Proc. IEEE Symposium on Foundations of Computer Science (FOCS), pp. 32–42, 2000. [26] B. P. Tunstall, Synthesis of noiseless compression codes, Ph.D. dissertation, Georgia Inst. Technol., Atlanta, GA, Sept. 1967. [27] K. Visweswariah, S. R. Kulkarni and S. Verd´u, “Source codes as random number generators,” IEEE Trans. Inform. Theory, vol. 44, no. 2, pp. 462–471, 1998.
16
[28] K. Visweswariah, S. R. Kulkarni and S. Verd´u, “Univerasal variable-tofixed length source codes,” IEEE Trans. Inform. Theory, vol. 47, no. 4, pp. 1461–1472, 2001. [29] H. Zhou and J. Bruck, “Efficiently generating random bits from finite state markov chains,” IEEE Trans. Inform. Theory, vol. 58, pp. 2490– 2506, 2012. [30] H. Zhou and J. Bruck, “Streaming Algorithms for Optimal Generation of Random Bits,” Technical Report, Electrical Engineering, California Institute of Technology, 2012. [31] J. Ziv and A. Lempel, “Compression of individual squences by varaible rate coding,” IEEE Trans. Inform. Theory, vol. 24, pp. 530–536, 1978. [32] D. Zuckerman, “General weak random sources,” in Proc. IEEE Symposium on Foundations of Computer Science, pp. 534–543, 1990.