Source Coding Algorithms Using the Randomness ... - Semantic Scholar

0 downloads 0 Views 393KB Size Report
Apr 4, 2005 - source coding algorithms that use the randomness of a past sequence and analyze the encoding rate and decoding error rate. The proposed ...
IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1063

PAPER

Source Coding Algorithms Using the Randomness of a Past Sequence Jun MURAMATSU†a) , Member

SUMMARY We propose source coding algorithms that use the randomness of a past sequence. The proposed algorithms solve the problems of multi-terminal source coding, rate-distortion source coding, and source coding with partial side information at the decoder. We analyze the encoding rate and the decoding error rate in terms of almost-sure convergence. key words: bits-back coding, lossy source coding, multiterminal source coding, random coding, simulated random coding algorithm, source coding with partial side information at the decoder

1.

Introduction

The aim of this paper is providing an explicit constraction of deterministic mapping of source codes. The algorithm does not need any auxiliary random source. We propose source coding algorithms that use the randomness of a past sequence and analyze the encoding rate and decoding error rate. The proposed algorithms solve the problems of multiterminal source coding, rate-distortion source coding, and source coding with partial side information at the decoder. Slepian and Wolf [46] introduced the source coding problem for multiple access networks (Fig. 1) and determined the achievable rate region for the lossless source coding. We call this problem the multi-terminal source coding problem. The problem is generalized by Cover [6], and Miyake and Kanaya [29]. Csisz´ar [8] proved that there is an optimal linear code for this problem. According to Witsenhausen [54], there is no truly lossless code achieving the boundary of the Slepian-Wolf rate region. Therefore, we should focus on finding good lossless codes that satisfy either of the following two criteria. • The decoding error probability should converge to zero as the length of a string tends to infinity. We call these types of codes source codes with arbitrary small decoding error probability. Uyematsu [50] constructed codes using a combination of concatenated codes and linear codes. • The decoding error rate, which is the number of error letters divided by the length of a string, should converge to zero as the length of the string tends to infinity. This criterion is familiar in the context of convolutional codes. We call these types of codes source codes with Manuscript received May 12, 2003. Manuscript revised June 1, 2004. Final manuscript received January 4, 2005. † The author is with the NTT Communication Science Laboratories, NTT Corporation, Kyoto-fu, 619–0237 Japan. a) E-mail: [email protected] DOI: 10.1093/ietfec/e88–a.4.1063

- fX

RX ≥ H(X|Z)

-

Z - fZ

RZ ≥ H(Z|X)

-

X

g

- XZ

RX + RZ ≥ H(XZ) Fig. 1

Y

Multi-terminal source coding.

- f Fig. 2

I(Y; Z) EYZ ρ(Y, Z)

- g

- Z

Lossy source coding.

arbitrary small decoding error rate. This criterion was introduced by Zhao and Effros [67] in the context of source coding problems for multiple access networks. The main part of this paper discusses the performance of the algorithm in the sense of the second criterion. The first criterion is discussed in the appendix. The lossy source coding problem (Fig. 2), which is also called the rate-distortion problem, was introduced by Shannon [41] and the achievable rate region was determined by Shannon [42], and Berger [3]. Detailed surveys are given in [4] and [24]. The construction of the universally optimal code was proposed by Ornstein and Shields [37]. This algorithm collects typical sequences and another code construction by Neuhoff and Shields [34] takes the same approach. Lossy string matching algorithms, which are analogous with the Lempel-Ziv ’77 algorithm [68], have been actively studied. Steinberg and Gutman [47], and Kanaya and Muramatsu [22] studied a case where a database sequence is a past sequence. Koga and Arimoto [25], Yang and Kieffer [62], and Łuczak and Szpankowski [27] studied a case where a database sequence has an ideal distribution. The constructions of a universal database were proposed by Muramatsu and Kanaya [32], and Kontoyiannis [26]. Zhang and Wei [66] proposed a universal code using the ‘goldwashing’ algorithm which renews the database every time the source block is encoded. Other algorithms, which renew a database at every encoding of blocks, were proposed by Sadeh [40], and Zamir and Rose [65]. The algorithms derived from lossless data compression algorithms were introduced by Yang and Shen [63] with the KolmogorovChaitin complexity, Yang and Kieffer [61] with the LempelZiv ’78 algorithm, Muramatsu and Kanaya [31][33], and Yang, Zhang, and Berger [64] with arbitrary lossless data compression algorithms. Some of the above algorithms are

c 2005 The Institute of Electronics, Information and Communication Engineers Copyright 

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1064

X

- fX

H(X|Z)

g

Y

- fY

Fig. 3

I(Y; Z)

-

-

X

Source coding with partial side information at the decoder.

erator, an encoding map, and a lossless source encoder. The decoder is constructed of a buffer, a random number generator, a decoding map generator, a decoding map, a lossless source decoder, and a block coupler, where the buffer and the random number generator are the same as in the encoder. The buffers store part of the past sequence, which has been transmitted by using the noiseless source code. Each sub-block is encoded and decoded by the following procedures. Encoding Algorithm Step 1: A source output is separated by the block separator into a code generation block and a compressible block. Step 2: The random number generator transforms the data in the buffer into random numbers, which are used by the encoding map generator to renew the encoding map. Step 3: The compressible block is encoded into a compressible block codeword by the encoding map. Step 4: The code generation block is encoded into a code generation block codeword by the lossless source encoder. The buffer is renewed by storing the code generation block and discarding the oldest code generation block. Decoding Algorithm

Fig. 4

Configuration of simulated random coding algorithm.

universally optimal, where we do not need to know the probability distribution of a source. In this paper, it is assumed that both the encoder and decoder have the knowledge of the probability distribution of a source. The problem of source coding with partial side information at the decoder (Fig. 3) was introduced by Wyner and Ziv [58], and Wyner [55]. The achievable rate region of this problem was determined by Ahlswede and K¨orner [1], Gray and Wyner [19], Wyner [56][57], and Miyake and Kanaya [29]. This paper describes its construction. The proposed algorithms can be called ‘simulated random coding algorithms,’ because they simulate the random generation of a codebook, which appears in ‘random coding arguments.’ The algorithms are outlined below. The sequence is chunked into sub-blocks, which are encoded and decoded separately. The algorithm for coding one sub-block is explained. Figure 4 illustrates the configuration of the encoder and decoder, where arrows shows the data flow. In this figure, it is assumed that there is only one encoder. The encoder is constructed of a block separator, a buffer, a random number generator, an encoding map gen-

Step 1: The random number generator transforms the data in the buffer into random numbers, which are used by the decoding map generator to renew the decoding map. Step 2: The decoder receives the compressible block codeword and decodes it with the decoding map into the compressible block reproduction. Step 3: The decoder receives the code generation block codeword and the lossless decoder decodes it into the code generation block. The buffer is renewed by storing the code generation block and discarding the oldest code generation block. Step 4: The block coupler concatenates the compressible block reproduction and the code generation block, which outputs the reproduction. The idea of the proposed algorithm is related to ‘bitsback coding,’ which was introduced by Frey and Hinton [14][15] and originated with Wallace [52], and Hinton and Zemel [20]. When a source has a latent variable, the bits-back algorithm uses the auxiliary random data to encodes the source with an average encoding rate with respect to the latent variable. Specifically, the bits-back coding with feedback [13] uses the past sequence as the auxiliary random data. The novel idea described in this paper is that the proposed algorithms assume that a codebook has a latent variable, while the bits-back coding assumes that a source has a latent variable. With the proposed algorithm a randomly generated codebook is regarded as a latent variable and a part of past sequence is used in the generation of the codebook. Intuitively, by renewing the codebook after encoding the source blocks, the decoding error rate achieves

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1065

the average error probability of random coding arguments, which tends to zero as the length of a string tends to infinity. The contribution made by this work is to provide a solution to the problems described below and thus realize these algorithms. • In the bits-back coding, both the encoder and decoder should have a synchronized random sequence. However, with the above source coding settings, there is inevitable distortion between the source output and the reproduction. It is unclear how to obtain synchronized random sequences with the above settings. • In random coding arguments, we need random numbers with ideal distribution. It is unclear how the error of random number generation effects the decoding error rate. To solve the first problem, the source output is separated into the synchronized random sequence part and the optimally compressed sequence part. Random sequence synchronization is obtained by encoding with a truly lossless source code. To solve the second problem, close random coding arguments are presented that consider the effect of the difference between the available randomness and the ideal randomness. Oohama [35][36] evaluated the precision of random number generation measured by the variational distance. However, their results are not used directly in our analysis. Universal lossless source coding algorithms based on string matching, which is originated from Ziv and Lempel [68], can be treated as a variant of the proposed algorithm, and this topic is discussed in Sect. 6. The construction of a lossy source coding algorithm is related to algorithms proposed in [25], [62], and [27], where ideal distributions of random numbers are required. The gold-washing lossy source coding algorithm [66] also uses random numbers, where the construction of the database is different from the proposed algorithm and ideal distributions of random numbers are required. Lossy source coding algorithms presented in [47], [22], [40], and [65] use the sequence in the past, where the construction of the database is different from that of the algorithm proposed here. This paper is organized as follows. Section 3, 4, and 5 describe the algorithm for the multi-terminal source coding, the lossy source coding, and the source coding with partial side information, respectively. Section 6 discusses the relation between the proposed algorithm and universal lossless source coding algorithms based on the string matching. The proofs of the theorems are presented in Sect. 7. In Appendix A, a code with arbitrary small decoding error probability is constructed from a code with arbitrary small decoding error rate. In Appendix B, the achievable rate regions are presented for the multi-terminal source coding and the source coding with partial side information at the decoder, where the criterion of lossless codes is given by arbitrary small decoding error rate.

2.

Preliminary

This section presents notations that appear in subsequent sections. We denote the complement of a set A by [A]c , the difference set when a set G is subtracted from a set F by F −G, and the cardinality of a set A by |A|. Let X, Y, and Z be a n finite set. We define B ≡ {0, 1} and B∗ ≡ ∪∞ i=1 B . The concatenation of sequences is denoted by ∗. Let xij ≡ xi ∗ · · · ∗ x j and [xn ]i ≡ xi for xn ≡ x1 ∗ · · · ∗ xn . The number of error letters d(xn ), xn ∈ Xn for an enn n coder ϕn : Xn → B∗ and a decoder ϕ−1 n : φn (X ) → X is defined by n n d(xn ) ≡ |{1 ≤ i ≤ n : [ϕ−1 n (ϕn (x ))]i  [x ]i }.

Let BX (x) ∈ B log2 |X| be the binary representation of x ∈ X. Similarly, we define the binary representation of k xk ∈ Xk by BX (xk ) ∈ B log2 |X | . Let X, Y, and Z be random variables. Let µX be the probability distribution of random variable X and µX n (xn ) ≡ n i=1 µX (xi ). The entropy of X is defined by  H(X) ≡ − µX (x) log2 µX (x). x∈X

We define the entropy of joint random variable XY by H(XY), the conditional entropy of X given Y by H(X|Y) ≡ H(XY)− H(Y), and the mutual information between X and Y by I(X; Y) ≡ H(X)+H(Y)−H(XY). The divergence D(ν µ) between distributions µ and ν is defined by D(ν µ) ≡



ν(x) log2

x∈X

ν(x) . µ(x)

The empirical distribution  ν xn of xn ∈ Xn is defined by  ν xn (a) ≡

N(a|xn ) n

for a ∈ X, where N(a|xn ) denotes the number of a in xn . The conditional empirical distribution  ν xn |yn of (xn , yn ) ∈ Xn × Yn is defined by a distribution that satisfies N(a, b|xn , yn ) =  ν xn |yn (a|b)N(b|yn ) for all (a, b) ∈ X × Y. The functions ζ(α, γ) and λ(α, k) are defined for 0 < α < ∞, γ > 0 and k by   2γ , ζ(α, γ) ≡ γ − 2γ log2 α α log2 [k + 1] λ(α, k) ≡ . k Finally, the reminder of i divided by n is denoted by i%n, 0 ≤ i%n ≤ n − 1, where i is allowed to be represented by the binary notation.

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1066 i[k+2]+2 , x(i) ≡ xi[k+2]+1

3.

Multi-Terminal Source Coding Algorithm

[i+1][k+2] x(i) ≡ xi[k+2]+3 .

In this section, we construct an algorithm for the multiterminal source coding of correlated sources X and Z (Fig. 1). We assume that (RX , RZ ) satisfies 0 < H(X|Z) ≤ RX < H(X), 0 < H(Z|X) ≤ RZ < H(Z), H(XZ) ≤ RX + RZ .

(1) (2) (3)

Similarly, we define z(i) ∈ Z2 and z(i) ∈ Zk , i ∈ I. We assume that both the encoder of X and the decoder have a buffer s(i) ≡ i−1 j=i−mk |Xk | [x( j)]1 for each i ∈ I, where k x( j) is a null-string for j < 0. s :  Xmk |X | × Xk → We define

k Xmk and r : Xmk |X | × Xk → 0, . . . , 2k[RX +δk ] by



∗j:

s(s, x) ≡

sj

(5)

1≤ j≤mk |X | j%|Xk |=BX (x) k

Let



C γk δk mk m k

 H(X|Z) H(Z|X) ≡ min , , H(X) H(Z) 4 λ(|X||Z|, Ck) 3 log2 k + + , ≡ C Ck Ck 3 ≡ 6ζ(|X||Z|, γk ) + + γk ,   k k[RX + δk ] ≡ , H(X)   k[RZ + δk ] ≡ , H(Z)

r(s, x) ≡ πk (s(s, x))

for s ∈ Xmk |X | , x ∈ Xk . It should be noted here that x∈Xk s(s, x) is a permutation of letters in s. Similarly, both the encoder of Z and the decoder have s (i) ≡

i−1

m k |Zk | × Zk → Zmk . It j=i−m k |Zk | [z( j)]2 , i ∈ I and s : Z should be noted here that we use [x(i)]1 to renew s(i) and [z(i)]2 to renew s (i) in order to generate the codebooks independently. Algorithm for Encoding X k





Mk ≡ max{mk |Xk |, m k |Zk |}, qk ≡ [k + 1]Mk , I ≡ {−Mk , . . . , qk − Mk − 1}. mk We define TX,γ ⊂ Xmk by k



mk TX,γ ≡ xmk : D( ν xmk µX ) ≤ γk . k mk mk Let ξX,k : TX,γ → 0, . . . , |TX,γ | − 1 be an arbitrary onek k to-one mapping. Performance analysis does not depend on the choice of ξX,k . For example, the arithmetic coding al−1 . Let a gorithm [39] can be used to construct ξX,k and ξX,k

 mk k[RX +δk ] be defined by function πk : X → 0, . . . , 2

  mk m   2k[RX +δk ] , if xmk ∈ TX,γ , ξ X,k (x k )% k  πk (x ) ≡  mk   2k[RX +δk ] , . if xmk  TX,γ k mk

(4)

The function πk is constructed as follows. Roughly speakmk is devided into 2k[RX +δk ] bins each of which have ing, TX,γ   k T mk  2−k[RX +δk ] elements in T mk . Then πk maps an elX,γk

X,γk

(6)

mk ement in TX,γ into the index of the bin that has the elk  m c k into the another ement and maps an element in TX,γ k   m k m index. Functions ξZ,k : TZ,γk → 0, . . . , |TZ,γk k | − 1 and



π k : Zmk → 0, . . . , 2k[RZ +δk ] are also defined similarly to ξX,k and πk , respectively. We divide xqk [k+2] = (x−Mk [k+2]+1 , . . . , xkMk [k+2] ) and qk [k+2] = (z−Mk [k+2]+1 , . . . , zkMk [k+2] ) into non-overlapping qk z sub-blocks with length k + 2. The components x(i) ∈ X2 and x(i) ∈ Xk , i ∈ I of a sub-block are defined by

Step E1: Let i ← −Mk . Step E2: Transmit BX ([x(i)]1 ) ∗ BX ([x(i)]2 ) ∗ BX (x(i)) to the decoder. Step E3: Let i ← i + 1. Step E4: If i = 0, then go to Step E5. Otherwise, go to Step E2. Step E5: Transmit BX ([x(i)]1 ) ∗ BX ([x(i)]2 ) to the decoder. Step E6: r(s(i),

Transmit   x(i))  in log2 2k[RX +δk ] + 1 bits to the decoder. Step E7: Let i ← i + 1. Step E8: If i = qk − Mk , then the encoding is completed. Otherwise, go to Step E5. Similarly, we define an algirithm for encoding Z. Decoding Algorithm k ⊂ Xk × Zk be defined by Let TXZ,γ k k TXZ,γ ≡ (xk , zk ) : D( ν(xk ,zk ) µXZ ) ≤ γk . k k We assume that the

decoder  has bins D(s, r) ⊂ X , s ∈ mk |Xk | k[RX +δk ] , r ∈ 0, . . . , 2 X defined by D(s, r) ≡ xk : πk (s(s, xk )) = r . (7)

k m |Z | Similarly, the

decoder  has bin D (s , r ) ⊂ Z , s ∈ Z k ,

k[RZ +δk ] . r ∈ 0, . . . , 2

k

Step D1: Let i ← −Mk . Step D2: Receive BX ([x(i)]1 ) ∗ BX ([x(i)]2 ) ∗ BX (x(i)) and reproduce x(i) ∗ x(i). Similarly, receive BZ ([z(i)]1 ) ∗ BZ ([z(i)]2 ) ∗ BZ (z(i)) and reproduce z(i) ∗ z(i). They are losslessly decoded. Step D3: Let i ← i + 1.

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1067

Step D4: If i = 0, then go to Step D5. Otherwise, go to Step D2. Step D5: Receive the codeword encoded in Step E5 from both encoders and reproduce x(i) and z(i), which are losslessly decoded. Step D6: Receive the codeword encoded in Step E6 from both encoders and let r and r be the reproduced ink has a dexes. If [D(s(i), r) × D (s (i), r )] ∩ TXZ,2γ k unique element then let this element be the reproduction of (x(i), z(i)). Otherwise, let an arbitrary ( xˆk , zˆk ) ∈ D(s(i), r)×D (s (i), r ) be the reproduction of (x(i), z(i)). It should be noted here that (x(i), z(i)) may be incompletely decoded. Step D7: Let i ← i + 1. Step D8: If i = qk − Mk , then the decoding is completed. Otherwise, go to Step D6. For a pair of source outputs (xqk [k+2] , zqk [k+2] ), let ) and LZ (zqk [k+2] ) be the total encoding length of LX (x encoders for X and Z, respectively. We denote the number of decoding error letters by d(xqk [k+2] , zqk [k+2] ). Then we have the following theorem. qk [k+2]

Theorem 1. Let X and Z be the correlated stationary memoryless sources that satisfy (1)–(3). Then 1 lim LX (xqk [k+2] ) = RX , k→∞ qk [k + 2] 1 LZ (zqk [k+2] ) = RZ , lim k→∞ qk [k + 2] 1 lim d(xqk [k+2] , zqk [k+2] ) = 0, k→∞ qk [k + 2]

Mk ≡ m k |Jk |, qk ≡ [k + 1]Mk , I ≡ {−Mk , . . . , qk − Mk − 1}. m

k We define TY,γkk and TZ,γ by k

 

m νym k µY ) ≤ γk , TY,γkk ≡ ymk ∈ Ymk : D( k TZ,γ ≡ zk ∈ Zk : D( νzk pZ ) ≤ γk . k m

m

k Let ξY,k : TY,γkk → {0, . . . , |TY,γkk | − 1} and ξZ,k : TZ,γ → k k {0, . . . , |TZ,γk | − 1} be arbitrary one-to-one mapping. We de fine σk : Ymk → Zk by     m k  −1 m k  m   ξZ,k ξY,k (y k )% TZ,γk  , if y k ∈ TY,γk , m k σk (y ) ≡  (11) 

m   zk , if ymk  T k , Y,γk



c

k where  zk ∈ TZ,γ . The function σk is constructed as folk m

(8)

lows. Roughly speaking, the set TY,γkk is partitioned by bins    −1 m k  each of which have TY,γkk  TZ,γ  elements. Then σk maps k

(9)

k by looking for an element in TY,γkk into a element in TZ,γ k

m

m

(10)

almost surely. We prove the theorem in Sect. 7.1. Remark 1. The constructed code can be extended to a code with arbitrary small decoding error probability. This topic is described in Appendix A. Remark 2. In Appendix B, we determine the rate region of the multi-terminal source coding with arbitrary small decoding error. 4.

m ζ(|Y|, γk ) H(Y) + + γk , δ k ≡ ζ(|Y||Z|2 , γk ) + k k k



Jk ≡ 1, . . . , 2k[RY +δk ] ,

Lossy Source Coding Algorithm

In this section, we discuss the lossy source coding problem (Fig. 2). We assume that a distortion function ρ : Y × Z → [0, ∞)satisfies ρmax ≡ max(y,z)∈Y×Z ρ(y, z) < ∞. Let ρn (yn , zn ) ≡ ni=1 ρ(yi , zi ). We fix a joint distribution pYZ (y, z), y ∈ Y, z ∈ Z of random variables Y and Z that satisfies I(Y; Z) > 0. We assume that both the encoder and decoder know pYZ (y, z). We encode the output of Y at rate RY ≡ I(Y; Z). Let 3 log2 k 1 + , γk ≡ λ(|Y|, k) + k k   kH(Z) m k ≡ , H(Y)

−1 to obtain a bin that has the element in TY,γkk and using ξZ,k k the element in TZ,γk from the index of the bin. The ele c m ment in TY,γkk is mapped into some pre-defined element  c k  zk ∈ TZ,γ . k We divide yqk [k+1] = (y−Mk [k+1]+1 , . . . , ykMk [k+1] ) into non-overlapping qk sub-blocks of length k + 1. The components y(i) ∈ Y and y(i) ∈ Yk , i ∈ I of sub-block are defined by

y(i) ≡ yi[k+1]+1 , [i+1][k+1] y(i) ≡ yi[k+1]+2 .

Finally, we assume that both the encoder and the decoder have a buffer u(i) ≡ i−1 j=i−m k |Jk | y( j) for each i ∈ I. We

m k |Jk | define u : Y × Jk → Ymk and z : Ymk |Jk | × Jk → Zk by



u(u, j) ≡



j: 1≤j≤m k |Jk | j%|Jk |= j

uj ,

z(u, j) ≡ σk (u(u, j))

(12)

(13)

for u ∈ Ymk |Y | , j ∈ Jk . It should be noted here that j∈Jk u(u, j) is a permutation of letters in u.



k

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1068

Encoding Algorithm Step E1: Step E2: Step E3: Step E4: E2. Step E5: Step E6:

Let i ← −Mk . Transmit BY (y(i)) ∗ BY (y(i)) to the decoder. Let i ← i + 1. If i = 0, then go to Step E5. Otherwise, go to Step

Theorem 3. When pYZ is fixed such that (I(Y; Z), EYZ [ρ(Y, Z)]) is on the rate-distortion curve, the proposed algorithm is asymptotically optimal.

Transmit BY (y(i)) to the decoder. Let ψk (y(i)) ∈ Jk defined by ψk (y(i)) ≡ arg min ρk (y(i), z(u(i), j)). j∈Jk



Transmit ψk (y(i)) in log2 2k[RY +δk ] bits to the decoder. Step E7: Let i ← i + 1. Step E8: If i = qk − Mk , then the encoding is completed. Otherwise, go to Step E5. Decoding Algorithm Step D1: Let i ← −Mk . Step D2: Receive BY (y(i)) ∗ BY (y(i)) and reproduce y(i) ∗ y(i), which is losslessly decoded. Step D3: Let i ← i + 1. Step D4: If i = 0, then go to Step D5. Otherwise, go to Step D2. Step D5: Receive the codeword encoded in Step E5 and reproduce y(i), which is losslessly decoded. Step D6: Receive the codeword encoded in Step E6, and let j ∈ Jk be the reproduced index. Let z(u(i), j) be the reproduction of y(i). Step D7: Let i ← i + 1. Step D8: If i = qk − Mk , then the decoding is completed. Otherwise, go to Step D5. Let LY (yqk [k+1] ) be the total number of bits needed to encode yqk [k+1] . We denote the distortion between yqk [k+1] and the reproduction by d(yqk [k+1] ). Then, we have the following theorem. Theorem 2. For any probability distribution pYZ which satisfies I(Y; Z) > 0, 1 LY (yqk [k+1] ) = I(Y; Z), qk [k + 1] 1 d(yq[k+1] ) ≤ EYZ [ρ(Y, Z)], lim sup k→∞ qk [k + 1] lim

k→∞

(14) (15)

almost surely. We prove the theorem in Sect. 7.2. According to the rate-distortion theory, there is a tradeoff between encoding rate and distortion. The optimal pair (R, D) of rate and distortion is on the curve defined by the following functions   inf E pYZ ρ(Y, Z) , DY (R) ≡ pYZ :I(Y;Z)≤R

RY (D) ≡

inf

pYZ :E pYZ [ρ(Y,Z)]≤D

I(Y; Z).

The above two functions are the inverse of the other function. We can use the Arimoto-Blahut algorithm [2][5] to calculate pYZ such that (I(Y; Z), EYZ [ρ(Y, Z)]) is on the ratedistortion curve. We have the following theorem from the above theorem.

Remark 3. When {z(u, j)} j∈Jk is generated by a random number generator with distribution pZ the modified algorithm is the one proposed by Koga and Arimoto [25], Yang and Kieffer [62], and Łuczak and Szpankowski [27]. In our construction, {z(u, j)} j∈Jk is generated using the past sequence and we prove that the overhead costs can be negligible when we use a random number generator with distribution pY instead of pZ . That is the difference between our and their results. 5.

Source Coding Algorithm with Partial Side Information at the Decoder

In this section, we construct the source coding algorithm, which encodes the output of a source Y as partial side information at late RY , encodes the output of a source X at RX , and reproduces X (Fig. 3). We fix a joint probability distribution pXYZ (x, y, z), x ∈ X, y ∈ Y, z ∈ Z of random variables X, Y, and Z that satisfies the following conditions. • |Z| = |Y| + 2,  • pXYZ (x, y, z) = µXY (x, y), z∈Z

for all (x, y) ∈ X × Y, µXY (x, y)pYZ (y, z) • pXYZ (x, y, z) = , µY (y) for all (x, y, z) ∈ X × Y × Z s.t. µY (y)  0,

(16) (17)

(18)

• 0 < H(X|Z) < H(X) and I(Y; Z) > 0, (19)  where pYZ (y, z) ≡ ≡ x∈X pXYZ (x, y, z) and µY (y)  p (x, y, z). (x,z)∈X×Z XYZ We define RX ≡ H(X|Z), RY ≡ I(Y; Z),

! H(X|Z) H(Z) , , H(X) H(Y) 4 λ(|X||Y||Z|, Ck) 3 log2 k + + , ≡ C Ck Ck   kH(Z) ≡ H(Y) 2m k ζ(|Y|, γk ) 4 ≡ 6ζ(|X||Y||Z|, 3γk ) + + + γk , k k

2mk ζ(|Y|, γk ) H(Y) 2 + + γk , ≡ 2ζ(|Y||Z| , γk ) + k k

C ≡ min γk m k δk δ k

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1069



 k[RX + δk ] , H(X)

 Jk ≡ 1, . . . , 2k[RY +δk ] , mk ≡

Mk ≡ max{mk |Xk |, m k |Jk |}, qk ≡ [k + 1]Mk , I ≡ {−Mk , . . . , qk − Mk − 1}. We divide xqk [k+2] = (x−Mk [k+2]+1 , . . . , xkMk [k+2] ) and = (y−Mk [k+2]+1 , . . . , ykMk [k+2] ) into non-overlapping qk y sub-blocks of length k + 2. For each i ∈ I, the components x(i) ∈ X2 , x(i) ∈ Xk , y(i) ∈ Y2 , and y(i) ∈ Yk of sub-block are defined by qk [k+2]

i[k+2]+2 x(i) ≡ xi[k+2]+1 , [i+1][k+2] x(i) ≡ xi[k+2]+3 ,

Step E3: Let i ← i + 1. Step E4: If i = 0, then go to Step E5. Otherwise, go to Step E2. Step E5: Transmit BY ([y(i)]2 ) to the decoder. Step E6: If there is j ∈ Jk such that (y(i), z(u(i), j)) ∈ k , let ψk (y(i)) be the least of such j ∈ Jk . OtherTYZ,2γ k j ∈  Jk is arbitrary. Then wise, let ψk (y(i)) ≡ j , where

k[RY +δk ] bits to the decoder. transmit ψk (y(i)) in log2 2 Step E7: Let i ← i + 1. Step E8: If i = qk − Mk , then the encoding is completed. Otherwise, go to Step E5. Decoding Algorithm k Let TXZ,3γ ⊂ Xk × Zk be defined by k k k k k ,zk ) p XZ ) ≤ 3γk . TXZ,3γ ≡ (x , z ) : D( ν (x k k We assume that the

decoder  has bins D(s, r) ⊂ X , s ∈ mk |Xk | k[RX +δk ] , r ∈ 0, . . . , 2 X defined by (7).

i[k+2]+2 , y(i) ≡ yi[k+2]+1 [i+1][k+2] y(i) ≡ yi[k+2]+3 .

Finally, we assume that both the encoder of X and i−1 the decoder have a buffer s(i) ≡ j=i−mk |Xk | [x( j)]1 , i ∈ k , . . . , q − M − 1}. We define r : Xmk |X | × Xk → {−M k  k k 0, . . . , 2k[RX +δk ] by (6). We also assume that both the encoder of Y and the decoder have a buffer u(i) ≡ i−1 m k |Jk | ×Jk → Zk by j=i−m k |Jk | [y( j)]2 , i ∈ I. We define z : Y (13). It should be noted here that we use [x(i)]1 to renew s(i) and [y(i)]2 to renew u(i) in order to generate the codebooks independently. Algorithm for Encoding X mk be as defined in Sect. 3. The encoder of X is Let TX,γ k almost the same as the encoder constructed in Sect. 3.





Step E1: Let i ← −Mk . Step E2: Transmit BX ([x(i)]1 ) ∗ BX ([x(i)]2 ) ∗ BX (x(i)) to the decoder. Step E3: Let i ← i + 1. Step E4: If i = 0, then go to Step E5. Otherwise, go to Step E2. Step E5: Transmit BX ([x(i)]1 ) ∗ BX ([x(i)]2 ) to the decoder. Step E6: Transmit r(s(i),   x(i)) in log2 2k[RX +δk ] + 1 bits to the decoder. Step E7: Let i ← i + 1. Step E8: If i = qk − Mk , then the encoding is completed. Otherwise, go to Step E5. Algorithm for Encoding Y m k be as defined in Sect. 4. Let TY,γkk and TZ,γ k k k k TYZ,2γk ⊂ Y × Z be defined by

Let

k TYZ,2γ ≡ (yk , zk ) : D( ν(yk ,zk ) pYZ ) ≤ 2γk . k

Step D1: Let i ← −Mk . Step D2: Receive BX ([x(i)]1 ) ∗ BX ([x(i)]2 ) ∗ BX (x(i)) from the encoder of X and reproduce x(i)∗ x(i). Similarly, receive BY ([y(i)]2 ) from the encoder of Y and reproduce [y(i)]2 . They are losslessly decoded. Step D3: Let i ← i + 1. Step D4: If i = 0, then go to Step D5. Otherwise, go to Step D2. Step D5: Receive the codeword encoded in Step E5 of both encoders and reproduce x(i) and [y(i)]2 , which are losslessly decoded. Step D6: Receive the codeword encoded in Step E6 of both encoders. Let r be the reproduced index that is transmitted by the encoder of X. Let j ∈ Jk be the reproduced index that is transmitted by the encoder of Y. If k has a unique element [D(s(i), r) × {z(u(i), j)}] ∩ TXZ,3γ k k k (x , z(u(i), j)), then let x be the reproduction of x(i). Otherwise, let an arbitrary xˆk ∈ D(s(i), r) be the reproduction of x(i). It should be noted here that x(i) may be incompletely decoded. Step D7: Let i ← i + 1. Step D8: If i = qk − Mk , then the decoding is completed. Otherwise, go to Step D5. Let LX (xqk [k+2] ) and LY (yqk [k+2] ) be the total encoding length of the encoders of X and Y, respectively. We denote the number of decoding error letters by d(xqk [k+2] , yqk [k+2] ). Then we have the following theorem. Theorem 4. Let X and Y be the correlated stationary memoryless sources, and pXYZ be the distribution that satisfies (16)–(19). Then 1 LX (xqk [k+2] ) = H(X|Z), qk [k + 2] 1 LY (yqk [k+2] ) = I(Y; Z), lim k→∞ qk [k + 2] 1 d(xqk [k+2] , yqk [k+2] ) = 0, lim k→∞ qk [k + 2] lim

k→∞

The encoder of Y is almost the same as the encoder constructed in Sect. 4. Step E1: Let i ← −Mk . Step E2: Transmit BY ([y(i)]2 ) to the decoder.

(20) (21) (22)

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1070

almost surely. We prove the theorem in Sect. 7.3. In general, there is a trade-off when choosing the pair of the rates, RX and RY . The optimal pair (RX , RY ) is on a curve defined by the following functions RX (RY ) ≡ RY (RX ) ≡

inf

H(X|Z),

inf

I(Y; Z),

pXYZ :I(Y;Z)≤RY

pXYZ :H(X|Z)≤RX

where pXYZ in the inf takes over the distribution satisfying (16)–(18). They are inverse of the other function. When pXYZ is fixed such that (H(X|Z), I(Y; Z)) is on the curve RX (RY ) (or RY (RX )) for RX > 0 and RY > 0, Theorem 4 implies that the encoding rate pair of the proposed algorithm is on the same curve. Combining this fact and Theorem 9 in Appendix B, we have the following theorem. Theorem 5. When pXYZ is fixed such that (H(X|Z), I(Y; Z)) is on the curve RX (RY ) and RY (RX ) for RX > 0 and RY > 0, the proposed algorithm is asymptotically optimal. Remark 4. Tishby, Pereira, and Bialek [48] proposed an approximate algorithm for finding pXYZ such that the encoding rate pair is on the curve RX (RY ) (or RY (RX )) from µXY . Remark 5. The constructed code can be extended to a code where the decoding error probability tends to zero as the length of string tends to infinity. This topic is covered in Appendix A. Remark 6. In Appendix B, we determine the rate region of the source coding with arbitrary small decoding error. 6.

Interpretation of Universal Lossless Sorce Coding Algorithms Based on String Matching

In this section, we provide an interpretation of universal lossless source coding algorithms based on the string matching as a simulated random coding algorithm. We apply the lossy source coding algorithm proposed in Sect. 4 to universal lossless source coding in the following. Let k Jk ≡ {1, . . . , 2 log2 |Y | }, Mk ≡ |Jk |, qk ≡ [k + 1]Mk , I ≡ {−Mk , . . . , qk − Mk − 1}.

We divide yqk k into non-overlapping qk sub-blocks of length k. Sub-blocks y(i) ∈ Yk , i ∈ I are defined by y(i) ≡ y[i+1]k ik+1 . We assume that both the encoder and decoder have a i−1 buffer u (i) ≡ j=i−|Jk | y( j) for each i ∈ I, where y( j) is arbitraly when j < −Mk . Encoding Algorithm



Step E1: Let i ← −Mk . Step E2: Transmit BY (y(i)) to the decoder. Step E3: Let i ← i + 1. Step E4: If i = 0, then go to Step E5. Otherwise, go to Step E2. k Step E5: If there is 1 ≤ t ≤ 2 log2 |Y | such that

k|J |−t−1 y(i) = [u (i)]k[|Jk |−1]−t−1 , encode the least of such t in k " # log2 t + 2 log2 log2 [t + 1] bits by using the Elias binary encoding of integers [11] and transmit the encoded data with the prefix 0. Otherwise, transmit BY (y(i)) with the prefix 1. Step E6: Let i ← i + 1. Step E7: If i = qk − Mk , then the encoding is completed. Otherwise, go to Step E5. In the above algorithm, the database u (i) corresponds to a past sequence. An identity map is used while σk is used in Sect. 4. An overlap is allowed to find the exactly matching sequence in this algorithm, while no overlap is allowed in Sect. 4. The renewal of the database in Steps E3 and E6 of the above algorithm corresponds to the addition of the last encoded block and the deletion of the oldest block in the database, which is a component of coding algorithms based on string matching. The Elias binary encoding of integers realizes a fixed-to-variable length code and a 1 bit prefix is used to synchronize the database. Decoding Algorithm Step D1: Let i ← −Mk . Step D2: Receive BY (y(i)) and reproduce y(i), which is losslessly decoded. Step D3: Let i ← i + 1. Step D4: If i = 0, then go to Step D5. Otherwise, go to Step D2. Step D5: Receive the encoded data in Step E5. If the first bit is 0, reproduce the index t from the following bits k|J |−t−1 and let [u (i)]k[|Jk |−1]−t−1 be the reproduction of y(i) If k the first bit is 1, reproduce y(i) from the following BY (y(i)). In any case, y(i) is losslessly decoded. Step D6: Let i ← i + 1. Step D7: If i = qk − Mk , then the decoding is completed. Otherwise, go to Step D5. It should be noted here that the distribution pYZ is not k k , TZ,γ , and given in advance because we let Z ≡ Y, and TY,γ k k H(Y) are not used in the above algorithms while they are used in Sect. 4. We have the following theorem. Theorem 6. Let LY (yqk k ) be the total number of bits needed to encode yqk k . Then lim

k→∞

1 LY (yqk k ) = H(Y), qk k

almost surely,

(23)

for any stationary memoryless source Y. This theorem can be proved via the recurrence theorem presented by Willems [53], Wyner and Ziv [59], Ornstein

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1071

and Weiss [38]. We offer another proof of the theorem based on the proposed method in Sect. 7.4. Our proof provides us with an interpretation of these algorithms as a simulated random coding algorithm. 7.

Lemma 3. For any γ > 0, γ > 0, and yk ∈ Yk , k c ] ) ≤ 2−k[γ− µX k ([TX,γ

|X| log2 [k+1] ] k

= 2−k[γ−λ(|X|,k)] ,

k k c k −k[γ − µX k |Y k ([TX|Y,γ

(y )] |y ) ≤ 2

Proof of Theorems

|X||Y| log2 [k+1] ] k

Before proving the theorems, we prepare the following lemk ⊂ Xk and mas for a set of typical sequences. Let TX,γ k k k k k TX|Y,γ

(y ) ⊂ X , y ∈ Y be defined by k TX,γ ≡ xk ∈ Xk : D( ν xk µ X ) ≤ γ , k TX|Y,γ (yk ) ≡ xk ∈ Xk : D( ν xk |yk µX|Y | νyk ) ≤ γ , where D(pX|Y p X|Y |qY ) ≡



q(y)

y∈Y



p(x|y) log2

x∈X

p(x|y) . p (x|y)

For 0 < α < ∞, 0 < β < ∞, γ > 0, and γ > 0, let ζ (α, β, γ, γ ) be defined by   2γ  + 2γ log2 α. ζ (α, β, γ, γ ) ≡ γ − 2γ log2 αβ k Lemma 1. If xk ∈ TX,γ , then

    ν xk (a) − µX (a) ≤ 2γ,  ν xk (a) = 0,

for all a ∈ X,

if µX (a) = 0.

Proof. The proof is due to [49, Theorem 2.6]. The lemma can be proved directly from the fact that $  2D(ν µ) , |ν(a) − µ(a)| ≤ log2 e a∈X where e is the base of the natural logarithm (see [7, Lemma 12.6.1]).  Lemma 2. Let 0 < γ ≤ 1/8. Then    2γ − 1 log µ k (xk ) − H(X) ≤ γ − 2γ log 2 X 2  k  |X| = ζ(|X|, γ), k for xk ∈ TX,γ .

Proof. The proof is due to [49, Theorem 2.7]. From [9, Lemma 2.6], we have   − 1 log µ k (xk ) − H(X) ≤ D( ν xk )−H(X)|, ν xk µX )+|H( X 2  k  where H( ν xk ) is the entropy of the distribution  ν xk . Then we have the lemma from [9, Lemma 2.7]. 

= 2−k[γ −λ(|X||Y|,k)] . Proof. The lemma can be proved from [9, Lemma 2.2] and [9, Lemma 2.6] (see [49, Theorem 2.8] for detailed proof).  k Lemma 4. For any γ > 0, γ > 0, and yk ∈ TY,γ ,    1 log |T k | − H(X) 2 X,γ  k   2γ |X| log2 [k + 1] + , ≤ − 2γ log2 |X| k    1 log |T k (yk )| − H(X|Y) 2 X|Y,γ k    2γ  |X||Y| log2 [k + 1] + 2γ log2 |X|+ . ≤ − 2γ log2 |X||Y| k

Especially, for any γ > λ(|X|, k) and γ > λ(|X||Y|, k),    1 log |T k | − H(X) ≤ ζ(|X|, γ), 2 X,γ  k    1 log |T k (yk )| − H(X|Y) ≤ ζ (|X|, |Y|, γ, γ ). 2 X|Y,γ  k Proof. The lemma can be proved as the proof of [9, Lemma 2.13] (see [49, Theorem 2.9] for detailed proof).  k k k k k Lemma 5. If xk ∈ TX,γ and yk ∈ TY|X,γ

(x ), then (x , y ) ∈ k k k k k k TXY,γ+γ . If (x , y ) ∈ TXY,γ , then x ∈ TX,γ .

Proof. The proof is due to [49, Theorem 2.5]. The first statement can be proved from the fact that D( ν xk ,yk µXY ) = D( νyk µY ) + D( ν xk |yk µX|Y | νyk ). (24) The second statement can be proved from the fact that D( νyk µY ) ≤ D( ν xk ,yk µXY ), which is derived from (24) and non-negativity of the divergence.  Lemma 6. Let {G i (k)}∞ i=1 be a stationary process with constant k ∈ N ≡ {1, 2, . . .} such that 0 ≤ Gi (k) < ∞ for any i and k. Assume that there is K < ∞, a sequence of positive numbers {ηk }∞ k=1 , and functions g : N → N and f : N → (0, ∞) such that EµG(k) [G 1 (k)] ≤ 2−ηk , for all k > K, ∞  f (g(k))2−ηk < ∞, k=1

(25) (26)

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1072

lim

k→∞

1 = 0, f (g(k))

(27)

where EµG(k) is an expectation with respect to the probability measure µG(k) corresponding to {G i (k)}∞ i=1 . Then 1  Gi (k) = 0, k→∞ g(k) i=1 g(k)

lim

almost surely.

(28)

Proof. From the Markov inequality [45, Lemma I.1.13], stationarity of Gi (k), (25), and (26), we have   g(k) ∞       1 1    µG(k)  Gi (k) ≥     g(k) f (g(k))  i=1 k=1   g(k) ∞   1    f (g(k))E µG (k)  Gi (k) ≤ g(k) i=1 k=1 =

∞  k=1

= =

∞  k=1 ∞ 

1  Eµ [Gi (k)] g(k) i=1 G(k) g(k)

f (g(k))

1  f (g(k)) Eµ [G1 (k)] g(k) i=1 G(k) g(k)

f (g(k))E µG(k) [G1 (k)]

k=1



K 

f (g(k))E µG(k) [G1 (k)] +

k=1

∞ 

f (g(k))2−ηk

k=1

< ∞. Then, from the Borel-Cantelli Theorem [45, Lemma I.1.14], we have for all sufficiently large k 1  1 , Gi (k) < g(k) i=1 f (g(k)) g(k)

almost surely. From Gi (k) ≥ 0 and (27), we obtain (28).



7.1 Proof of Theorem 1 First, we prove (8). We can denote LX (xqk [k+2] ) by 

# " LX (xq[k+2] ) = 2Mk log2 |X| + Mk log2 |Xk | " # + 2[qk − Mk ] log2 |X|

   + [qk − Mk ] log2 2k[RX +δk ] + 1 . Since limk→∞ γk = 0, we have limk→∞ ζ(|X||Z|, γk ) = 0 and limk→∞ δk = 0, which implies (8). Similarly, we obtain (9). Next, we prove (10). Let ek be the number of i ∈ I ≡ {0, . . . , qk − Mk − 1} such that (x(i), z(i)) is decoded incorrectly. Then we have d(xqk [k+2] , zqk [k+2] ) ≤ ek k. Before the evaluation of ek , we confirm that (x(i), z(i)) is decoded correctly when (x(i), z(i)) satisfies the following conditions at Step D6 on i ∈ I . Let r(i) ≡ r(s(i), x(i)) and r (i) ≡ r (s (i), z(i)), which are transmitted numbers at i ∈ I .

k . Condition 1 (x(i), z(i)) ∈ TXZ,γ k Condition 2 There is no vk ∈ D(s(i), r(i)) such that vk  k x(i) and (vk , z(i)) ∈ TXZ,γ . k k Condition 3 There is no w ∈ D (s (i), r (i)) such that wk  k z(i) and (x(i), wk ) ∈ TXZ,γ . k Condition 4 There is no (vk , wk ) such that (vk , wk ) ∈ k , vk  x(i), D(s(i), r(i))×D (s (i), r (i)), (vk , wk ) ∈ TXZ,γ k and wk  z(i).

Since x(i) ∈ D(s(i), r(i)) and z(i) ∈ D (s (i), r (i)), we k have (x(i), z(i)) ∈ [D(s(i), r(i)) × D (s (i), r (i))] ∩ TXZ,γ k when (x(i), z(i)) satisfies Condition 1. Hence, (x(i), z(i)) is k a unique element in [D(s(i), r(i)) × D (s (i), r (i))] ∩ TXZ,γ k when (x(i), z(i)) satisfies Condition 1–4. This implies that (x(i), z(i)) are decoded correctly at Step D6. Let E1 , E2 , E3 , and E4 be the sets of i ∈ I that satisfy Condition 1–4, respectively. Then Ec1 , Ec2 , Ec3 and Ec4 can be denoted by Ec1 ≡ {i :        c ≡ E2  i:             c i: E3 ≡                  c E4 ≡  i:         

k (x(i), z(i))  TXZ,γ }, k

       k k (v , z(i)) ∈ TXZ,γk , ,      k v ∈ D(s(i), r(i))   ∃wk  z(i) s.t.      k k (x(i), w ) ∈ TXZ,γk ,  ,      k



w ∈ D (s (i), r (i)) ∃vk  x(i) s.t.

 ∃vk  x(i) ∃wk  z(i) s.t.       k k k  (v , w ) ∈ TXZ,γk ,   .   k   v ∈ D(s(i), r(i)),      wk ∈ D (s (i), r (i)) 

The upper bound of ek is given by ek ≤ |Ec1 | + |Ec2 | + |Ec3 | + |Ec4 |. We define  k   , if (xk , zk )  TXZ,γ 1, k ≡  0, otherwise,    1, if ∃vk ∈ Xk s.t.       vk  xk ,     (2) k k k χk (s, x , z ) ≡  , (vk , zk ) ∈ TXZ,γ  k    k k  v ∈ D(s, r(s, x )),      0, otherwise,    1, if ∃wk ∈ Zk s.t.       wk  z k ,    

k k k , (xk , wk ) ∈ TXZ,γ χ(3)  k (s , x , z ) ≡  k  

 k

k  w ∈ D (s , r (s , z )),      0, otherwise,

k k χ(1) k (x , z )

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1073

   1,              (4)

k k χk (s, s , x , z ) ≡               0,

if ∃vk ∈ Xk , ∃wk ∈ Zk s.t. vk  xk , wk  zk , k , (vk , wk ) ∈ TXZ,γ k k k v ∈ D(s, r(s, x )), wk ∈ D (s , r (s , zk )), otherwise.

Let S (i) be the random variable corresponding to s(i)

and S ≡ S (0). Similarly, we define S (i), X(i), Z(i), S , X, and Z. To apply Lemma 6, we prove the following inequalities for all sufficiently large k:   −k[Cγk −λ(|X||Z|,Ck)]+2 , (29) E XZ χ(1) k (X, Z) ≤ 2   (2) −k[Cγk −λ(|X||Z|,Ck)]+2 ES XZ χk (S , X, Z) ≤ 2 , (30)   (3) (31) ES XZ χk (S , X, Z) ≤ 2−k[Cγk −λ(|X||Z|,Ck)]+2 ,   (4)

−k[Cγ −λ(|X||Z|,Ck)]+2 k , (32) ES S XZ χk (S , S , X, Z) ≤ 2

bounded as  µZ k (zk ) ≤ 2−k[γk −λ(|Z|,k)] . k zk TZ,γ

Lemma 7. For any xk ∈ Xk ,  µX mk |Xk | (s) ≤ 2−mk [γk −λ(|X|,mk )] . k

s∈Xmk |X | : m s(s,xk )TX,γk

k

s∈Xmk |X | : m s(s,xk )TX,γk

k  s∈Xmk [|X |−1]

k|



µX mk |Xk | (s)

+

≤2



k

µX k Z k (xk , zk )

∗j:

 s≡

s j.

1≤ j≤mk |X | j%|Xk |BX (x) k

k xk ∈Xk zk ∈TZ,γ



µX mk |Xk | (s)

µX k Z k (xk , zk ) k



 

k

s∈Xmk |X | : m s(s,xk )TX,γk

k

−mk [γk −λ(|X|,mk )]

µX k Z k (x , z )2 k

k

xk ∈Xk zk ∈Zk

k

s∈Xmk |X | : m s(s,xk )TX,γk

 

,

where  s is a sequence constructed by the deletion of s(s, xk ) from s, that is,

xk ∈Xk zk ∈Zk

xk ∈Xk zk ∈Zk

+

 s∈Xmk [|X |−1] −mk [γk −λ(|X|,mk )]

From the above lemma, the second term is bounded as    µX k Z k (xk , zk ) µX mk |Xk | (s)

µZ k (zk )  

µX mk [|Xk |−1] (  s )2−mk [γk −λ(|X|,mk )]



k s∈Xmk |X | : ∃vk ∈Xk s.t. k k

k

k zk TZ,γ

s(s,x )∈Xmk : m s(s,xk )TX,γk k

k

vk ∈D(s,r(s,xk ))



µX mk (s(s, xk ))

k

mk c µX mk [|Xk |−1] (  s )µX mk ([TX,γ ]) k





v x , k , (vk ,zk )∈TXZ,γ





=



µX mk [|Xk |−1] (  s)

k |−1]

s∈Xmk |X

µX k Z k (xk , zk )



 s∈Xmk [|X

for all sufficiently large k, which implies (29). Next, we prove (30). Since random variables S and XZ are mutually independent, we have   ES XZ χ(2) k (S , X, Z)    µX k Z k (xk , zk ) µX mk |Xk | (s)χ2 (s, xk , zk ) =

xk ∈Xk zk ∈Zk

k

=

≤ 2−k[Cγk −λ(|X||Z|,Ck)]+2 ,

=

k

Proof. Since X(i) and X( j) are mutually independent for i  j, we have  µX mk |Xk | (s)

≤ 2−k[γk −λ(|X||Z|,k)]

 

k

Next, we prepare the following lemma to evaluate the second term.

First, we prove (29). Since λ(α, k) > 0 is a decreasing function of k, 0 < C ≤ 1, and Lemma 3, we have   k c k k E XZ χ(1) k (X, Z) = µX Z ([TXZ,γk ] )

xk ∈Xk zk ∈Zk

(34)

= 2−mk [γk −λ(|X|,mk )] .

(35)

k

µX mk |Xk | (s).

k s∈Xmk |X | : m s(s,xk )∈TX,γk , k k k ∃v ∈X s.t. k k

v x , k , (vk ,zk )∈TXZ,γ k

vk ∈D(s,r(s,xk ))

(33) We evaluate the three terms of the right hand side of (33) separately. First, from Lemma 3, the first term is

Finally, we evaluate the third term. The third term is evaluated by

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1074

  k xk ∈Xk zk ∈TZ,γ

µX k Z k (xk , zk )



2   m   X +δk ] −1

2k[R 

  TX,γk k   [ν(r)]2 ≤ 2k[RX +δk ] " k[R +δ #] 2−mk [H(X)−ζ(|X|,γk )]  .  2 X k  r=0  

µX mk |Xk | (s) k

s∈Xmk |X | : m s(s,xk )∈TX,γk ,

k

k

∃vk ∈Xk s.t. vk xk , k , (vk ,zk )∈TXZ,γ



v ∈D(s,r(s,x ))

  k xk ∈Xk zk ∈TZ,γ

k k

k

µX k Z k (xk , zk )

vk ∈Xk : vk xk , k k k (v ,z )∈TXZ,γ

k





k

µX mk |Xk | (s).

k s∈Xmk |X | : m s(s,xk )∈TX,γk , k k k

v ∈D(s,r(s,x ))

(36) To proceed with the evaluation, we prove the following lemma. Lemma 8. If vk  xk and RX < H(X), then  µX mk |Xk | (s) ≤ 2−k[RX +δk ]+4mk ζ(|X|,γk )+3 k

s∈Xmk |X | : m s(s,xk )∈TX,γk , k

vk ∈D(s,r(s,xk ))

≤ 2k[RX +δk ]+1 for all sufficiently large k. From the fact that γk > λ(|X|, mk ), the definition of mk in Sect. 3, and Lemma 4, we have   m    T k    " X,γk #  ≤ 2mk [H(X)+ζ(|X|,γk )]−k[RX +δk ] + 1  2k[RX +δk ]    ≤ 2mk [H(X)+ζ(|X|,γk )]−k[RX +δk ]+1 for all sufficiently large k. Thus,   2 µX mk |Xk | (s) ≤ 2k[RX +δk ]+1 2−k[RX +δk ]+2mk ζ(|X|,γk )+1 k

s∈Xmk |X | : m s(s,xk )∈TX,γk , k

for all sufficiently large k.

vk ∈D(s,r(s,xk ))

Proof. Since vk ∈ D(s, r(s, xk )), r(s, vk ) = r(s, xk ), and πk (s(s, vk )) = πk (s(s, xk )) are equivalent, [X(i)]1 and [X( j)]1 are independent for i  j, and 0 ≤ πk (xmk ) ≤ 

mutually mk , we have 2k[RX +δk ] − 1 for xmk ∈ TX,γ k 

We have

 2k[RX +δk ] ≤ 2k[RX +δk ] + 1

µX mk |Xk | (s)

k s∈Xmk |X | : m s(s,xk )∈TX,γk , k k k

= 2−k[RX +δk ]+4mk ζ(|X|,γk )+3 .

We return to the evaluation of the third term of (33). We have    µX k Z k (xk , zk ) µX mk |Xk | (s) k xk ∈Xk zk ∈TZ,γ

v ∈D(s,r(s,x ))

=

k

s∈Xmk |X | : m s(s,xk )∈TX,γk ,

k

k

X +δk ] −1

2k[R

r=0





µX mk (s(s,xk ))



µX mk (s(s,vk ))

k



r=0

where ν(r) ≡ µX mk ({xmk ∈ Xmk ; πk (xmk ) = r}) and  s is the sequence constructed by deleting s(s, xk ) and s(s, vk ) from s, that is, s j.

1≤ j≤mk |Xk | j%|Xk |BX (xk ) j%|Xk |BX (vk )

It should be noted here that we use the condition vk  xk in the first equality.

 mk and If 0 ≤ πk (xmk ) ≤ 2k[RX +δk ] − 1, then xmk ∈ TX,γ k m −m [H(X)−ζ(|X|,γ )] k k k from Lemma 2. From the fact µX mk (x ) ≤ 2 that γk > λ(|X|, mk ) and Lemma 4, we have



vk ∈D(s,r(s,xk ))

  k xk ∈Xk zk ∈TZ,γ

s(s,vk )∈Xmk : πk (s(s,vk ))=r

X +δk ] −1

2k[R = [ν(r)]2 ,

∗j:

∃vk ∈Xk s.t. vk xk , k , (vk ,zk )∈TXZ,γ

µX mk [|Xk |−2] (  s)

k |−2]

 s∈Xmk [|X

s(s,xk )∈Xmk : πk (s(s,xk ))=r

 s≡



for all sufficiently large k.



µX k Z k (x , z ) k

k

−k[RX +δk ]+4mk ζ(|X|,γk )+3

=2

k

= 2−k[RX +δk ]+4mk ζ(|X|,γk )+3

k 



µZ k (zk )

vk ∈Xk : k (vk ,zk )∈TXZ,γ k



k

µX k Z k (xk , zk )

vk ∈Xk : xk ∈Xk k (vk ,zk )∈TXZ,γ k

2−k[H(Z)−ζ(|Z|,γk )]

vk ∈Xk : k (vk ,zk )∈TXZ,γ



k (vk ,zk )∈TXZ,γ

k

−k[H(Z)−ζ(|Z|,γk )]

2 k

−kδk +k[ζ(|Z|,γk )+ζ(|X||Z|,γk )]+4mk ζ(|X|,γk )+3

≤2





 k zk ∈TZ,γ

−k[RX +δk ]+4mk ζ(|X|,γk )+3

2−k[RX +δk ]+4mk ζ(|X|,γk )+3



k zk ∈TZ,γ k

≤ 2−k[RX +δk ]+4mk ζ(|X|,γk )+3

k

vk ∈Xk : k (vk ,zk )∈TXZ,γ

k zk ∈TZ,γ k

≤2

2−k[RX +δk ]+4mk ζ(|X|,γk )+3

vk ∈Xk : vk xk , k (vk ,zk )∈TXZ,γ

k

  k xk ∈Xk zk ∈TZ,γ



µX k Z k (xk , zk )

,

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1075

where the first inequality comes from (36) and Lemma 8, the third inequality comes from Lemma 2, and the last inequality comes from Lemma 4 and RX ≥ H(X|Z). From the fact that 0 < H(X|Z) < H(X) and the definition of mk in Sect. 3 we have mk ≤ k for all sufficiently large k. Combining the definition of δk in Sect. 3 and the fact that ζ(α, γk ) is an increasing function of α > 0, we have 2−kδk +k[ζ(|Z|,γk )+ζ(|X||Z|,γk )]+4mk ζ(|X|,γk )+3 ≤ 2−kδk +6kζ(|X||Z|,γk )+3 = 2−kγk .

(37)

We substitute (34)–(37) into (33). Since λ(α, k) > 0 is an increasing function of α > 0 and a decreasing function of k, and C ≤ mk /k, we have   −kγk +2−mk [γk −λ(|X|,mk )] +2−k[γk −λ(|Z|,k)] ES XZ χ(2) k (S , X, Z) ≤ 2 ≤ 2−k[Cγk −λ(|X||Z|,Ck)]+2 , which implies (30). We can prove (31) and (32) in the same way as (30). Now we apply Lemma 6. Let

+ χ(2) k (s(i), x(i), z(i))

+ χ(3) k (s (i), x(i), z(i))

=

k=1

Paying attention to the fact that limk→∞ δ k = 0 and RY = I(Y; Z), we can prove (14) in the same way as in the proof of Theorem 1. k ⊂ First, we prove the following lemmas. Let TYZ,2γ k k k Y × Z be defined by k TYZ,2γ ≡ (yk , zk ) : D( ν(yk ,zk ) pYZ ) ≤ 2γk . k k Lemma 9. If (yk , zk ) ∈ TYZ,2γ , then k   √ ρk (yk , zk ) ≤ k EYZ [ρ(Y, Z)] + 2|Y||Z| γk ρmax .





pYZ (b, c) +



 4γk ρ(b, c)

     √  pYZ (b, c)ρ(b, c)+2|Y||Z| γk ρmax ≤ k  (b,c)∈Y×Z   √ = k EYZ [ρ(Y, Z)] + 2|Y||Z| γk ρmax . 

2

for all sufficiently large k. Proof. For µY k (yk )  0, let pZ|Y (z|y) be defined by pZ|Y (z|y) ≡

∞  4 max{log2 |X|, log2 |Z|} k2 k=K+1

< ∞, Applying

d(xqk [k+2] , zqk [k+2] ) ek k ≤ lim sup q [k + 2] q [k + 2] k k k→∞ k→∞ ek ≤ lim sup k→∞ qk − Mk |Ec | + |Ec2 | + |Ec3 | + |Ec4 | ≤ lim sup 1 qk − Mk k→∞

lim sup

7.2 Proof of Theorem 2

k |{z ∈ Zk : (yk , zk ) ∈ TYZ,2γ }| ≥ 2k[H(Z|Y)−ζ(|Y||Z| ,γk )] k

log2 kMk k3

for sufficiently large K, which implies (26). Lemma 6, we have

We obtain (10) from the above inequality and d(xqk [k+2] , zqk [k+2] ) ≥ 0. Thus the proof of the theorem is completed. 

k , then Lemma 10. If yk ∈ TY,γ k

K  log2 kMk ≤ k3 k=1

+

Gi (k)

i=0

(b,c)∈Y×Z

∞  f (g(k)) k3 k=1 ∞ 

qk −M k −1

almost surely.

≤k

From (29)–(32), we have (25). We have (27) from the definitions of f and g. Furthermore, from the definitions of γk and ηk , we have 2−ηk = 1/k3 and

k=1

= 0,

(b,c)∈Y×Z

+ χ(4) k (s(i), s (i), x(i), z(i)), ηk ≡ k[Cγk − λ(|X||Z|, Ck)] − 4, g(k) ≡ qk − Mk = kMk f (g) ≡ log2 g.

f (g(k))2−ηk =

k→∞

1 qk − Mk

k . From Lemma 1, we Proof. Assume that (yk , zk ) ∈ TYZ,2γ k have  N(b, c|yk , zk )ρ(b, c) ρk (yk , zk ) =

Gi (k) ≡ χ(1) k (x(i), z(i))

∞ 

= lim sup

pYZ (y, z) . µY (y)

k For yk ∈ Yk , let TZ|Y,γ (yk ) ∈ Zk be defined by k k TZ|Y,γ (yk ) ≡ zk ∈ Zk : D( νzk |yk pZ|Y | νyk ) ≤ γk . k k } Then Lemma 5 implies that zk ∈ {zk ∈ Zk ; (yk , zk ) ∈ TYZ,2γ k k k k k when y ∈ TY,γk and z ∈ TZ|Y,γk (y ). From this fact and Lemma 4, k k }| ≥ |TZ|Y,γ (yk )| |{zk ∈ Zk ; (yk , zk ) ∈ TYZ,2γ k k

≥ 2k[H(Z|Y)−ζ (|Z|,|Y|,γk ,γk )] = 2k[H(Z|Y)−ζ(|Y||Z| ,γk )] 2

k for any yk ∈ TY,γ . k



IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1076

   4   k 1 − µY k (y ) =  k j∈Jk  yk ∈TY,γ  k k

We define    1,     (5) k χk (u, y ) ≡      0,

if ∃ j ∈ Jk s.t. k , (yk , z(u, j)) ∈ TYZ,2γ k otherwise.

(41)

(39)

From

ymk ∈ k. Then, from Lemmas 2, 4, and 10, we have  µY m k (u(u, j))

u(u, j)∈Ymk : k (y ,σk (u(u, j)))∈TYZ,2γ k

=

=



+

u∈Y

k



µY k (y )

k yk ∈TY,γ



+

u∈Y

k

=



k yk ∈TY,γ

  k µY m k |Yk | (u) 1 − χ(5) (u, y ) k

= 2−k[I(Y;Z)+δk −γk ]

u∈Y



µY k (yk )

u∈Y

k

m |Jk | k

u∈Y

k

≥2

µY m k |Yk | (u)

  k µY m k |Yk | (u) 1 − χ(5) k (u, y )

m |Jk | k

m |Jk | k

−k[γk −λ(|Y|,k)]

+2

(40)

,

where we use Lemma 3 to derive the last inequality. Since {u(U, j)} j∈Jk are mutually independent, we can derive the first term in the above inequality as 

µY k (yk )

k yk ∈TY,γ

=

u∈Y

k 

=

m |Jk | k

µY k (yk )

k yk ∈TY,γ





k

µY m k |Yk | (u)

∀ j∈Jk k (yk ,z(u, j))TYZ,2γ

k

k



u∈Ymk |Jk | :

µY k (y )

k yk ∈TY,γ

  k µY m k |Yk | (u) 1 − χ(5) k (u, y )

4

j∈Jk



2

−k[I(Y;Z)+ζ(|Y||Z|2 ,γk )+m k ζ(|Y|,γk )/k+H(Y)/k]

k + µY k ([TY,γ ]c ) k     k ≤ µY k (yk ) µY m k |Yk | (u) 1 − χ(5) (u, y ) k k yk ∈TY,γ

k

≥2

m |Jk | k

µY k (yk )

k yk TY,γ k

2−mk [H(Y)+ζ(|Y|,γk )]

  k µY m k |Yk | (u) 1 − χ(5) (u, y ) k



µY m k (u(u, j))

k[H(Z|Y)−ζ(|Y||Z|2 ,γk )] −m k [H(Y)+ζ(|Y|,γk )]

m |Jk | k



k

zk ∈Zk : k (yk ,zk )∈TYZ,2γ

m |Jk | k

u∈Y

k





  k µY m k |Yk | (u) 1 − χ(5) k (u, y )



µY k (yk )

k yk TY,γ





k yk ∈TY,γ

µY m k (u(u, j))

m zk ∈Zk : u(u, j)∈TY,γk : k k (yk ,zk )∈TYZ,2γ k σk (u(u, j))=zk

u∈Ymk |Jk |

µY k (yk )







 EUY 1 − χ(5) k (U, Y)     k µY k (yk ) µY m k |Yk | (u) 1 − χ(5) = k (u, y ) 





k

zk ∈Zk : u(u, j)∈Ymk : k k (yk ,zk )∈TYZ,2γ k σk (u(u, j))=z





k

the definition of m k in Sect. 4 and Lemma 4, there is

m k TY,γkk such that σk (ymk ) ∈ TZ,γ for all sufficiently large k

for all sufficiently large k. Since Y  and U are mutually independent, EUY  1 − χ(5) k (U, Y) can be evaluated by

yk ∈Yk

u(u, j)∈Ymk : k (y ,σk (u(u, j)))∈TYZ,2γ

(38)

Let Y(i) be the random variable corresponding to y(i) and Y ≡ Y(0). Similarly, we define U(i) and U. In the following, we prove   −k[γk −λ(|Y|,k)]+1 EUY 1 − χ(5) k (U, Y) ≤ 2



    µY m k (u(u, j)) .  

k for all sufficiently large k and yk ∈ TY,γ , where the last ink equality and the last equality come from the definition of m k and δ k in Sect. 4, respectively. Since the left hand side of the above inequality is upper bounded by 1, we have

2−k[I(Y;Z)+δk −γk ] ≤ 1 for all sufficiently large k. Then, we have       4    1 − µY m k (u(u, j))  

 j∈Jk   u(u, j)∈Ymk :  k k (y ,σk (u(u, j)))∈TYZ,2γ k 4  −k[I(Y;Z)+δ k −γk ] 1−2 ≤ j∈Jk



2k[RY +δ k ]

≤ 1 − 2−k[I(Y;Z)+δk −γk ] 5 6

≤ exp −2k[−I(Y;Z)−δk +γk +RY +δk ] 5 6 = exp −2kγk 5 6 ≤ exp −2k[γk −λ(|Y|,k)] ≤ exp(−k[γk − λ(|Y|, k)]) ≤ 2−k[γk −λ(|Y|,k)] log2 e

k

µY m k (u(u, j))

u(u, j)∈Ymk : k (yk ,σk (u(u, j)))TYZ,2γ

k

≤ 2−k[γk −λ(|Y|,k)] ,

(42)

k , where e is the base of the natural logarithm. for yk ∈ TY,γ k The third inequality comes from the fact that [1 − α]β ≤

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1077

exp(−αβ) for 0 ≤ α ≤ 1, β > 0. The first equality comes from RY ≡ I(Y; Z). Substituting (41) and (42) into (40), we have   EUY 1 − χ(5) k (U, Y)  ≤ µY k (yk )2−k[γk −λ(|Y|,k)] + 2−k[γk −λ(|Y|,k)] k yk ∈TY,γ

k

−k[γk −λ(|Y|,k)]+1

≤2

,

which implies (39). We return to the proof of the theorem. Let Gi (k) ≡ 1 − χ(5) k (u(i), y(i)). From (39), we have EµG(k) [G0 (k)] ≤ 2−k[γk −λ(|Y|,k)]+1 . By letting ηk ≡ k[γk − λ(|Y|, k)] − 1, we have (25). By letting f (g) ≡ log2 g and g(k) ≡ qk − Mk = kMk , we have (26) and (27). Let E5 be a set of i ∈ I ≡ {0, . . . , qk − Mk − 1} such that there is j ∈ Jk satisfying (y(i), z(u(i), j)) ∈ TYZ,2γk . Applying Lemma 6, we have lim sup k→∞

|Ec5 |k qk [k + 1]

= lim sup k→∞

k qk [k + 1]

1 ≤ lim sup q − Mk k k→∞ = 0,

qk −M k −1



  1 − χ(5) k (u(i), y(i))

pX k |Y k Z k (xk |yk , zk ) ≡

pX k Y k Z k (xk , yk , zk ) . pY k Z k (yk , zk )

Then, (18) implies that µX k Y k (xk , yk ) = µY k (yk )pX k |Y k Z k (xk |yk , zk )

(45)

for all (xk , yk , zk ) ∈ Xk × Yk × Zk such that µY k (yk ) > 0. We prepare the following lemma.

k

for all sufficiently large k.

i=0

(43)

From Lemma 9 and the fact that there is no decoding error of y(i) for any i ∈ I − I , we have ρk (y(i), z(u(i), ψk (y(i)))) ≤ ρk (y(i), z(u(i), j))   √ ≤ k EYZ [ρ(Y, Z)] + 2|Y||Z| γk ρmax

For pY k Z k (yk , zk )  0, let pX k |Y k Z k (xk |yk , zk ) be defined by

xk ∈Xk : k (xk ,yk ,zk )TXYZ,3γ

 (u(i), y(i)) 1 − χ(5) k

almost surely.

We can prove (20) and (21) in the same way as Theorems 1 and 2, respectively. Let ek be the number of i ∈ I ≡ {0, . . . , qk − Mk − 1} such that x(i) is decoded incorrectly. Then k ⊂ Xk ×Yk ×Zk d(xqk [k+1] , yqk [k+1] ) ≤ ek k. We define TXYZ,3γ k by k k k k k ,yk ,zk ) p XYZ ) ≤ 3γk . ≡ (x , y , z ) : D( ν TXYZ,3γ (x k

k Lemma 11. If (yk , zk ) ∈ TYZ,γ , then k  pX k |Y k Z k (xk |yk , zk ) ≤ 2−k[γk −λ(|X||Y||Z|,k)]

i=0

qk −M k −1

7.3 Proof of Theorem 4

k Proof. Let TX|YZ,γ (yk , zk ) ⊂ Xk be defined for (yk , zk ) ∈ k Yk × Zk by k (yk , zk ) ≡ xk : D( ν xk |(yk ,zk ) pX|YZ |ν(yk ,zk ) ) ≤ γk . TX|YZ,γ k

From Lemma 3, we have (44)

for i ∈ E5 . Combining (43), (44), and limk→∞ γk = 0, we have d(yqk [k+1] ) qk [k + 1] k→∞    |E5 |k  √ ≤ lim sup EYZ ρ(Y, Z) +2|Y||Z| γk ρmax qk [k+1] k→∞ ! |Ec5 |k + ρmax qk [k + 1]   √ ≤ EYZ ρ(Y, Z) + lim sup 2|Y||Z| γk ρmax

lim sup

k→∞

|Ec5 |k ρmax + lim sup k→∞ qk [k + 1]   ≤ EYZ ρ(Y, Z) , almost surely, which implies (15). Thus the proof of the theorem is completed. 

k (yk , zk )|yk , zk ) ≥ 1 − 2−k[γk −λ(|X||Y||Z|,k)] pX k |Y k Z k (TX|YZ,γ k

for any (yk , zk ) ∈ Yk × Zk and all sufficiently large k. From k k and xk ∈ TX|YZ,γ (yk , zk ), then Lemma 5, if (yk , zk ) ∈ TYZ,2γ k k k (xk , yk , zk ) ∈ TXYZ,3γ . Then we have k  k pX k |Y k Z k (xk |yk , zk ) ≥ pX k |Y k Z k (TX|YZ,γ (yk , zk )|yk , zk ) k xk ∈Xk : k (xk ,yk ,zk )∈TXYZ,3γ

k

≥ 1 − 2−k[γk −λ(|X||Y||Z|,k)] . This implies the lemma.



Before the evaluation of ek , we confirm that x(i) is decoded correctly when (x(i), y(i)) satisfies the following conditions at Step D9 on i ∈ I . Let r(i) ≡ r(s(i), x(i)) and z(i) ≡ z(u(i), ψk (y(i))). k . Condition 1 (x(i), y(i), z(i)) ∈ TXYZ,3γ k Condition 2 There is no element in [D(s(i), r(i)) × {z(i)}] ∩ k except (x(i), z(i)). TXZ,3γ k

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1078 k From Lemma 5, we have (x(i), z(i)) ∈ TXZ,3γ when k (x(i), y(i), z(i)) satisfies Condition 1. This implies x(i) ∈ k D(s(i), r(i)) and (x(i), z(i)) ∈ [D(s(i), r(i)) × {z(i)}] ∩ TXZ,3γ . k Hence, (x(i), z(i)) is a unique element in [D(s(i), r(i)) × k when (x(i), y(i)) satisfies Conditions 1 and {z(i)}] ∩ TXZ,3γ k 2. This implies that x(i) is decoded correctly at Step D9. Let E6 and E7 be sets of i ∈ I that satisfy Conditions 1 and 2, respectively. Then Ec6 and Ec7 can be denoted by

where (46) can be proved immediately from (39) and (48) can be proved in the same way as (30). In the following we prove (47). Since U and XY are mutually independent, we have   EUXY χ(6) k (U, X, Y)    = µX k Y k (xk , yk ) µY m k |Jk | (u)

xk ∈Xk yk ∈Yk

u∈Ymk |Jk | : k (y ,z(u,ψk (yk )))∈TYZ,2γ , k

k }, Ec6 ≡ {i : (x(i), y(i), z(i))  TXYZ,3γ k   k   ∃v  x(i) s.t.           k k c (v , z(i)) ∈ T , E7 ≡  i : . XZ,3γk            k v ∈ D(s(i), r(i))

Furthermore, we define E5 by E5 ≡ {i; (y(i), z(i)) ∈

k

=

=

where z(u(i), j) is defined by (13). (38) in the proof of Theorem 2. We k k χ(7) k (s, u, x , y ) by  k   , 1, if (yk , z(u, ψk (yk ))) ∈ TYZ,2γ   k   (6) k k k k k k χk (u, x , y ) ≡  (x , y , z(u, ψk (y )))  TXYZ,3γk ,     0, otherwise,    1, if ∃vk ∈ Xk s.t.       vk  xk ,     (7) k k k , (vk , z(u, ψk (yk ))) ∈ TXZ,3γ χk (s, u, x , y ) ≡   k    k k  v ∈ D(s, r(s, x )),      0, otherwise. Then we have + |E5 ∩ + qk [k + 2] |Ec5 | + |E5 ∩ Ec6 | + |Ec7 | ≤ qk − Mk qk −M k −1 1 = 1 − χ(5) k (u(i), y(i)) q − Mk i=0

ek k ≤ qk [k + 2]

µY k (y )

zk ∈Zk : k (yk ,zk )∈TYZ,2γ

|Ec7 |]k

+ χ(6) k (u(i), x(i), y(i))

 (s(i), u(i), x(i), y(i)) . + χ(7) k

In the following, let S (i) be a random variable corresponding to s(i) and S ≡ S (0). Similarly, we define X(i), U(i), Y(i), X, U, and Y. To apply Lemma 6, we prove the following inequalities for all sufficiently large k:   −k[Cγk −λ(|X||Y||Z|,Ck)]+2 EUY 1 − χ(5) , (46) k (U, Y) ≤ 2  (6)  −k[Cγk −λ(|X||Y||Z|,Ck)]+2 EUXY χk (U, X, Y) ≤ 2 , (47)   (7) ES XUY χk (S , U, X, Y) ≤ 2−k[Cγk −λ(|X||Y||Z|,Ck)]+2 , (48)

u∈Ymk |Jk | : z(u,ψk (yk ))=zk





zk ∈Zk : k (yk ,zk )∈TYZ,2γ



µX k Y k (xk , yk ) k

pX k |Y k Z k (xk |yk , zk )

xk ∈Xk : k (xk ,yk ,zk )TXYZ,3γ



µY k (y )

yk ∈Yk : µY k (yk )>0



k

µY m k |Jk | (u) k

k

xk ∈Xk : k (xk ,yk ,zk )TXYZ,3γ



k



k We define χ(5) k (u, y ) by (6) k k define χk (u, x , y ) and

µY m k |Jk | (u)

yk ∈Yk : µY k (yk )>0

k TYZ,2γ }. k

Ec6 |



zk ∈Zk : u∈Ymk |Jk | : k k k (yk ,zk )∈TYZ,2γ k z(u,ψk (y ))=z



k = {i : ∃ j ∈ Jk s.t. (y(i), z(u(i), j)) ∈ TYZ,2γ }, k

[|Ec5 |



 yk ∈Yk

k (xk ,yk ,z(u,ψk (yk )))TXYZ,3γ

k

k

−k[γk −λ(|X||Y||Z|,k)]

µY m k |Jk | (u)2

u∈Ymk |Jk | : z(u,ψk (yk ))=zk

= 2−k[γk −λ(|X||Y||Z|,k)]



−k[γk −λ(|X||Y||Z|,k)]

≤2



k

µY k (y )

yk ∈Yk

µY m k |Jk | (u)

u∈Ymk |Jk | : k (yk ,z(u,ψk (yk )))∈TYZ,2γ

y ∈Y : µY k (yk )>0 k



µY k (yk )

k



k

µY m k |Jk | (u)

m |J | u∈Y k k

= 2−k[γk −λ(|X||Y||Z|,k)] ≤ 2−k[Cγk −λ(|X||Y||Z|,Ck)]+2 , which implies (47), where the third equality comes from (45), the first ineuality comes from Lemma 11, the fourth

equality comes from the fact that Ymk |Jk | is divided into disk joint sets by the function z(·, ψk (y ))). Applying Lemma 6 by letting Gi (k) ≡ 1 − χ(5) k (u(i), y(i)) + χ(6) k (u(i), x(i), y(i)) + χ(7) k (s(i), u(i), x(i), y(i)), ηk ≡ k[Cγk − λ(|X||Y||Z|, Ck)] − 4, f (g) ≡ log2 g, g(k) ≡ qk − Mk = kMk , we have d(xqk [k+2] , yqk [k+2] ) 1 ≤ lim lim sup k→∞ q [k + 2] q − M−k k k k→∞ = 0,

qk −M k −1 i=0

almost surely.

Gi (k)

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1079

We obtain (22) from the above inequality and d(xqk [k+2] , yqk [k+2] ) ≥ 0. Thus the proof of the theorem is completed. 

# " at most 1 + log2 [k[|Jk | − 1]] + 2 log2 log2 [k[|Jk | − 1] + 1] bits at Step E5 when i ∈ E5 , we have 1 LY (yqk k ) k→∞ qk k  Mk ≤ lim sup log2 |Yk | k→∞ qk k

  |E5 | 

1 + log2 k 2k[RY +δk ] + lim sup k→∞ qk k   

+2 log2 log2 k 2k[RY +δk ] + 1

lim sup 7.4 Proof of Theorem 6 Since the theorem can be proved easily when H(Y) = log2 |Y|, we assume that H(Y) < log2 |Y| in the following. As with the special case of the lossy source coding problem described in Sect. 4, we assume that Y = Z, and let pYZ and ρ be defined by    if y = z, 1, pYZ (y, z) ≡   0, if y  z,    if y  z, 1, ρ(y, z) ≡   0, if y = z. Then we have Y = Z, I(Y; Z) = H(Y), and EYZ ρ(Y, Z) = 0. Let γk m k δ k Jk

3 log2 k 1 + , ≡ λ(|Y|, k) + k k   kH(Z) ≡ , H(Y) m ζ(|Y|, γk ) H(Y) + + γk , ≡ ζ(|Y||Z|2 , γk ) + k k

 k

≡ 1, . . . , 2k[RY +δk ] .

k . Then Lemma 1 Now we assume that (yk , zk ) ∈ TYZ,2γ k implies that pYZ (b, c) = 0 and N(b, c|yk , zk ) = 0 for b  c. Then we have  ρk (yk , zk ) = N(b, c|yk , zk )ρ(b, c) (b,c)∈Y×Z

=



N(b, b|yk , zk )ρ(b, b)

b∈Y

+



N(b, c|yk , zk )ρ(b, c)

b∈Y c∈Z: cb

= 0. k . Thus we obtain the fact that yk = zk if (yk , zk ) ∈ TYZ,2γ k

Let RY ≡ H(Y) < log2 |Y|. Then mk |Jk | ≤ k|Jk | for sufficiently large k. Let u(i) be the suffix of u (i) with length m k |Jk |. Let z(u, j) be defined by (13), where σk is the identity map and j u(u, j) ≡ u[k+1] k j+1

for u ∈ Ymk |Jk | , j ∈ Jk . Let E5 be the set of i ∈ I ≡ {0, . . . , qk − Mk − 1} such that there is j ∈ Jk satisfying k . Since σk is the identity map, we (y(i), z(u(i), j)) ∈ TYZ,2γ k have the fact that y(i) = u(u(i), j) when i ∈ E5 . This implies that there is 1 ≤ t ≤ k[|Jk | − 1] such that y(i) = k|J |−t−1 [u (i)]k[|Jk |−1]−t−1 when i ∈ E5 . Since y(i) is encoded in k



|Ec5 |  1 + log2 |Yk | k→∞ qk k = H(Y), almost surely, + lim sup

(49)

where the last eqality comes from (43). On the other hand, the converse of the source coding theorem [45, Theorem II.1.2] implies that lim inf k→∞

1 LY (yqk k ) ≥ H(Y), almost surely. qk k

(50)

We obtain (23) from (49) and (50). Thus the proof of the theorem is completed.  8.

Conclusion

This paper has introduced a simulated random coding algorithm that uses the randomness of a past sequence. The algorithm does not need any auxiliary random source. This algorithm has been applied to multi-terminal source coding, lossy source coding, and source coding with partial side information at the decoder. Their encoding rate and decoding error rate were analyzed. These algorithms with i ≥ 0 can be regarded as sliding block source codes [18]. Our constructions of sliding block codes are explicit while other constructions in [10], [18], [23], [43], [44] use non-explicit block codes. Certain questions remain as regards the proposed algorithms. These include the following: • An extension of the algorithms to a general class of sources is not presented. • The construction of universal source coding algorithms is not presented. • The proposed algorithm renews codebook in coding every sub-blocks. We can consider the situation that renewal of codebook is stopped at i = 0 in these algorithms. In the prove of theorems, we can easily check that the average decoding error rate goes to zero with this situation. It may require another analysis to prove the almost sure convergence of the decoding error rate, which is presented in this paper. • It may require another analysis for algorithms of lossless codes in Sect. 3 and 5 if all source output and reproduction is used for generating codebooks. • A typical set decoder is used in the proposed algorithm. When encoders are constructed with the ensemble of linear codes [8] or low density parity check

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1080

codes [16][28], it is possible to use the sum-product algorithm [13] to construct a maximal likelihood decoder. The sum-product algorithm is an iterated approximate algorithm and provides us with an effective decoding method. In addition, the construction of encoders requires a smaller memory size than the proposed algorithm. This topic will be discussed in [30] for the case of the multi-terminal source coding. • The simulated random coding algorithm can be regard as a method of constructing codes from random coding arguments. In the future it may be interesting to apply the simulated random coding algorithm to other coding problems such as lossy source coding with side information at the decoder [60]. Application of the simulated random coding algorithm to a channel coding [41] is discussed in [21]. Acknowledgements The author thanks Prof. Shunsuke Ihara, Prof. Fumio Kanaya, Prof. Tomohiko Uyematsu, Prof. Joe Suzuki, Prof. Yasutada Oohama, Prof. Tadashi Wadayama, Prof. Ken’ichi Iwata, Mr. Jun Imai, and Mr. Takafumi Mukouchi for their valuable comments on this paper. Constructive comments and suggestions by anonymous reviewers have significantly improved the presentation of our results. References [1] R. Ahlswede and J. K¨orner, “Source coding with side information and a converse for the degraded broadcast channel,” IEEE Trans. Inf. Theory, vol.21, no.6, pp.396–412, Nov. 1975. [2] S. Arimoto, “An algorithm for calculating the capacity of an arbitrary discrete memoryless channel,” IEEE Trans. Inf. Theory, vol.18, no.1, pp.14–20, Jan. 1972. [3] T. Berger, Rate-Distortion Theory: A Mathematical Basis for Data Compression, Prentice-Hall, Englewood Cliffs, NJ, 1971. [4] T. Berger and J.D. Gibson, “Lossy source coding,” IEEE Trans. Inf. Theory, vol.44, no.6, pp.2693–2723, Oct. 1998. [5] R.E. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Trans. Inf. Theory, vol.18, no.4, pp.460–473, July 1972. [6] T.M. Cover, “A proof of the data compression theorem of Slepian and Wolf for ergodic sources,” IEEE Trans. Inf. Theory, vol.21, no.2, pp.226–228, March 1975. [7] T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley & Sons, 1991. [8] I. Csisz´ar, “Linear codes for sources and source networks: Error exponents, universal coding,” IEEE Trans. Inf. Theory, vol.28, no.4, pp.585–592, July 1982. [9] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Academic Press, 1981. [10] L.D. Davisson and R.M. Gray, “A simplified proof of the slidingblock source coding theorem and its universal extension,” Proc. Int. Conf. Communication, vol.2, pp.34.4.1–34.4.5, 1978. [11] P. Elias, “Universal codeword sets and representations of the integers,” IEEE Trans. Inf. Theory, vol.21, no.2, pp.783–795, March 1975. [12] G.D. Forney, Jr., Concatenated Codes, MIT Press, 1966. [13] B.J. Frey, Graphical models for machine learning and digital communication, MIT Press, 1998.

[14] B.J. Frey and G.E. Hinton, “Free energy coding,” Proc. Data Compression Conference 1996, pp.73–81, Snowbird, UT, 1996. [15] B.J. Frey and G.E. Hinton, “Efficient stochastic source coding and an application to a Bayesian network source model,” Comput. J., vol.40, pp.157–165, 1997. [16] R.G. Gallager, “Low density parity check codes,” IRE Trans. Inf. Theory, vol.8, no.1, pp.21–28, Jan. 1962. [17] E.N. Gilbert, “A comparison of signalling alphabets,” Bell Syst. Tech. J., vol.31, pp.504–522, 1952. [18] R.M. Gray, “Sliding-block source coding,” IEEE Trans. Inf. Theory, vol.21, no.4, pp.357–368, July 1975. [19] R.M. Gray and A. Wyner, “Source coding for a simple network,” Bell Syst. Tech. J., vol.58, pp.1681–1721, Nov. 1974. [20] G.E. Hinton and R.S. Zemel, “Autoencoders, minimum description length and Helmholts free energy,” in Advances in Neural Information Processing Systems 6, pp.3–10, Morgan Kaufmann, 1994. [21] K. Iwata and J. Muramatsu, “Channel coding algorithm simulating the random coding,” IEICE Trans. Fundamentals, vol.E87-A, no.6, pp.1576–1582, June 2004. [22] F. Kanaya and J. Muramatsu, “An almost sure recurrence theorem with distortion for stationary ergodic sources,” IEICE Trans. Fundamentals, vol.E80-A, no.11, pp.2264–2267, Nov. 1997. [23] J.C. Kieffer, “Extension of source coding theorems for block codes to sliding-block codes,” IEEE Trans. Inf. Theory, vol.26, no.6, pp.679–692, Nov. 1980. [24] J.C. Kieffer, “A survey of the theory of source coding with a fidelity criterion,” IEEE Trans. Inf. Theory, vol.39, no.5, pp.1473– 1490, Sept. 1993. [25] H. Koga and S. Arimoto, “Asymptotic properties of algorithms of data compression with fidelity criterion based on string matching,” Proc. 1994 IEEE Inform. Symp. Inf. Theory, p.264, Trondheim, Norway, June–July, 1994. [26] I. Kontoyiannis, “An implementable lossy version of the LempelZiv algorithm — Part I: Optimality for memoryless sources,” IEEE Trans. Inf. Theory, vol.45, no.7, pp.2293–2305, Nov. 1999. [27] T. Luczak and W. Szpankowski, “A suboptimal lossy data compression based on approximate pattern matching,” IEEE Trans. Inf. Theroy, vol.43, no.5, pp.1439–1451, Sept. 1997. [28] D.J.C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Trans. Inf. Theory, vol.45, no.2, pp.399–431, March 1999. [29] S. Miyake and F. Kanaya, “Coding theorems on correlated general sources,” IEICE Trans. Fundamentals, vol.E78-A, no.9, pp.1063– 1070, Sept. 1995. [30] J. Muramatsu and T. Mukouchi, “Simulated random coding algorithm for correlated sources with ensemble of linear matrices,” submitted to IEICE Trans. Fundamentals. [31] J. Muramatsu and F. Kanaya, “Distortion-complexity and rate-distortion function,” IEICE Trans. Fundamentals, vol.E77-A, no.8, pp.1224–1229, Aug. 1994. [32] J. Muramatsu and F. Kanaya, “A universal data-base for data compression,” IEICE Trans. Fundamentals, vol.E78-A, no.9, pp.1057– 1062, Sept. 1995. [33] J. Muramatsu and F. Kanaya, “The dual quantity of the distortioncomplexity and a universal data-base for fixed-rate data compression with distortion,” IEICE Trans. Fundamentals, vol.E79-A, no.9, pp.1456–1459, Sept. 1996. [34] D.L. Neuhoff and P.C. Shields, “Simplistic universal coding,” IEEE Trans. Inf. Theory, vol.44, no.2, pp.778–781, March 1998. [35] Y. Oohama, “Arithmetic and sorting algorithms for fixed-to-fixed random number generation and their performance analysis,” submitted to IEEE Trans. Inf. Theory. [36] Y. Oohama, “Fixed-to-fixed random number generation for discrete memoryless sources,” submitted to IEEE Trans. Inf. Theory. [37] D.S. Ornstein and P.C. Shields, “Universal almost sure data compression,” Ann. Prob., vol.18, no.2, pp.441–452, 1990. [38] D.S. Ornstein and B. Weiss, “Entropy and data compression

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1081

[39] [40]

[41] [42]

[43]

[44] [45] [46]

[47]

[48]

[49] [50]

[51]

[52]

[53] [54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

schemes,” IEEE Trans. Inf. Theory, vol.39, no.1, pp.78–83, Jan. 1993. J. Rissanen and G.G. Langdon, “Arithmetic coding,” IBM J. Res. Dev., vol.23, pp.149–162, 1976. I. Sadeh, “Universal compression algorithms based on approximate string matching,” Proc. 1995 IEEE Int. Symp. Inform. Theory, p.84, Whistler, Canada, Sept. 1995. C.E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol.27, pp.379–423, 623–656, 1948. C.E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,” IRE National Conventional Record, Part 4, pp.142–163, 1959. P.C. Shields and D.L. Neuhoff, “Block and sliding-block source coding,” IEEE Trans. Inf. Theory, vol.23, no.2, pp.211–215, March 1977. P.C. Shields, “Stationary coding of processes,” IEEE Trans. Inf. Theory, vol.25, no.3, pp.283–291, May 1979. P.C. Shields, The Ergodic Theory of Discrete Sample Paths, American Mathematical Society, 1996. D. Slepian and J.K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inf. Theory, vol.19, no.4, pp.471–480, July 1973. Y. Steinberg and M. Gutman, “An algorithm for source coding subject to a fidelity criterion, based on string matching,” IEEE Trans. Inf. Theory, vol.39, no.3, pp.877–886, May 1993. N. Tishby, F.C. Pereira, and W. Bialek, “The information bottleneck method,” Proc. 37th Annual Allerton Conference on Communication, Control and Computing, pp.368–377, 1999. T. Uyematsu, Modern Shannon Theory, Baifukan, 1998. T. Uyematsu, “An algebraic construction of codes for Slepian-Wolf Source Networks,” IEEE Trans. Inf. Theory, vol.47, no.7, pp.3082– 3088, Nov. 2001. R.R. Varsharmov, “Estimate of the number of signals in error correcting codes,” Dokl. Akad. Nauk. SSSR, vol.117, pp.739–741, 1957. C.S. Wallace, “Classification by minimum-message-length inference,” Lecture Notes in Computer Science, no.468, pp.74–81, Springer, Berlin, Germany, 1990. F.M.J. Willems, “Universal data compression and repetition times,” IEEE Trans. Inf. Theory, vol.35, no.1, pp.54–58, Jan. 1989. H.S. Witsenhausen, “The zero-error side information problem and chromatic numbers,” IEEE Trans. Inf. Theory, vol.22, no.5, pp.592– 593, Sept. 1976. A.D. Wyner, “A theorem on the entropy of certain binary sequences and applications II,” IEEE Trans. Inf. Theory, vol.19, no.6, pp.772– 777, Nov. 1973. A.D. Wyner, “The common information of two dependent random variables,” IEEE Trans. Inf. Theory, vol.21, no.2, pp.163–179, March 1975. A.D. Wyner, “On source coding with side information at the decoder,” IEEE Trans. Inf. Theory, vol.21, no.3, pp.294–300, March 1975. A.D. Wyner and J. Ziv, “A theorem on the entropy of certain binary sequences and applications I,” IEEE Trans. Inf. Theory, vol.19, no.6, pp.769–771, Nov. 1973. A.D. Wyner and J. Ziv, “Some asymptotic properties of the entropy of a stationary ergodic data source with applications to data compression,” IEEE Trans. Inf. Theory, vol.35, no.6, pp.1250–1258, Nov. 1989. A.D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inf. Theory, vol.22, no.1, pp.1–10, Jan. 1976. E.H. Yang and J. C. Kieffer, “Simple universal lossy data compression schemes derived from Lempel-Ziv algorithm,” IEEE Trans. Inf. Theory, vol.42, no.1, pp.239–245, Jan. 1996. E.H. Yang and J. C. Kieffer, “On the performances of data compression algorithms based upon string matching,” IEEE Trans. Inf.

Theory, vol.44, no.1, pp.47–65, Jan. 1998. [63] E.H. Yang and S.Y. Shen, “Distortion program-size complexity with respect to a fidelity criterion and rate-distortion function,” IEEE Trans. Inf. Theory, vol.39, no.1, pp.288–292, Jan. 1993. [64] E.H. Yang, Z. Zhang, and T. Berger, “Fixed-slope universal algorithms for lossy source coding via lossless codeword length functions,” IEEE Trans. Inf. Theory, vol.43, no.5, pp.1465–1476, Sept. 1997. [65] R. Zamir and K. Rose, “Natural type selection in adaptive lossy compression,” IEEE Trans. Inf. Theory, vol.47, no.1, pp.99–111, Jan. 2001. [66] Z. Zhang and V.K. Wei, “An on-line universal lossy data compression algorithm via continuous codebook refinement—Part I: Basic results,” IEEE Trans. Inf. Theory, vol.42, no.3, pp.803–821, May 1996. [67] Q. Zhao and M. Effros, “Lossless and near-lossless source coding for multiple access networks,” IEEE Trans. Inf. Theory, vol.49, no.1, pp.112–128, Jan 2003. [68] J. Ziv and A. Lempel, “A universal algorithm for sequential data compression,” IEEE Trans. Inf. Theory, vol.23, no.3, pp.337–343, May 1977.

Appendix A: Construction of Codes with Arbitrary Small Block Error Probability Sections 3 and 5 considered lossless source coding with an arbitrary small decoding error rate. This appendix proves that it is possible to use these codes to construct codes with an arbitrary small block error probability. The construction of the proof is analogous to the concatenated codes proposed by Forney [12] and applied by Uyematsu [50] to the multi-terminal source coding. Forney used the Reed-Solomon codes as the outer code and Uyematsu used the generalized Hermitian codes as the outer code. We use the linear codes that achieve the GilbertVarsharmov lower bound as the outer code. Because there is no extension of the alphabets in the outer code and the inner and outer codes are completely separated, the outer code is simply a concatenation of the parity checks with the original sequence. However, it should noted here that it takes exponential time to construct the outer code. We prepare the following lemma on the existence of linear codes that achieve the Gilbert-Varsharmov lower bound. Let h(ε) ≡ −ε log2 ε − [1 − ε] log2 [1 − ε]. Lemma 12 ([17][51]). Let |X| be a prime number. For any n and N such that 0 < n < N, there is a systematic (N, n, dmin ) linear code on a finite field with order |X| such that dmin /N ≥ h−1 ([1 − n/N] log2 |X|), where n is the legth of a message, N is the legth of a codeword, and dmin is the minimum distance of this code. We prove the following theorem. To make the problem simple, it is assumed that we are encoding a source X. Let (b∗ ) be the length of a binary sequence b∗ ∈ B∗ . Theorem 7. Let ϕn : Xn → B∗ be the encoder of X. Let n n ϕ−1 n : ϕn (X ) → X be the decoder. Assume that 1 lim sup (ϕn (xn )) ≤ R, n→∞ n

almost surely,

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.4 APRIL 2005

1082

and there is a sequence {εn }∞ n=1 such that Appendix B: Source Coding Theorems for Lossless Codes with Arbitrary Small Decoding Error Rate

lim εn = 0 7 8 ∃N(x) s.t. ∞ µX x ∈ X : = 1, ∀n ≥ N(x), d(xn ) < nεn n→∞

where d(xn ) is the number of decoding error letters. Then, there is an encoder Φn : Xn → B∗ and a decoder Φ−1 n : Φn (Xn ) → Xn such that 1 lim sup (Φn (xn )) ≤ R, almost surely, n→∞ n    ∃N(x) s.t.∀n ≥ N(x),     ∞ µX  x ∈ X : −1  = 1,    n n   Φn (Φn (x )) = x 6 5 n n lim µX n {xn ∈ Xn : Φ−1 n (Φn (x ))  x } = 0. n→∞

(A· 1) (A· 2) (A· 3)

Proof. We can assume without loss of generality that |X| is a prime number. Let Nn ≡ n log2 |X|/[log2 |X|−h(εn )] . Then we have limn→∞ Nn = ∞. From the fact that limn→∞ εn = 0, we have 0 ≤ εn < 1/2 for all sufficiently large n. From Lemma 12, we can find a systematic (Nn , n, dmin ) linear code  ϕn : Xn → XNn such that h−1 ([1 − n/Nn ] log2 |X|) ≤ dmin /Nn . and nεn ≤ nh−1 ([1 − n/Nn ] log2 |X|)/2 < dmin /2. Let  ϕ−1 n : Nn n ϕn . X → X be the decoder of  ϕ n : Xn → Since  ϕn is systematic, there is a linear code  Nn −n n n

n X such that  ϕn (x ) = x ∗  ϕn (x ). Let BX : XNn −n → Nn −n n −n] log2 |X| B [N ∈ XNn −n be the " # function that encodes x in [Nn − n] log2 |X| bits. We construct the encoder Φn : Xn → B∗ by Φn (xn ) ≡ ϕn (xn ) ∗ BX ( ϕ n (xn )). Then we have (A· 1) from limn→∞ h(εn ) = 0. For b ≡ ϕn (xn )

and b ≡ BX ( ϕ n (xn )), we define the decoder by

−1

−1 ϕ−1 Φ−1 n (b ∗ b ) ≡  n (ϕn (b) ∗ BX (b )).

From the property of ϕn and ϕ−1 n , we have the fact that, there is n(x) ∈ N such that d(xn ) < nεn < dmin /2 for any n n > n(x), almost surely. This implies that Φ−1 n (Φn ((x ))) = n x and the proof of (A· 2) is completed. The proof of (A· 3) can be proved immediately from (A· 2).  When Lemma 6 is used in the proof of Theorems 1 and 4, we have 1  1 Gi (k) < g(k) i=1 f (g(k)) g(k)

for all sufficiently large k, almost surely, where Gi f , and g are determined depending on the proof. By letting k be determined by the total length of the sequence n, we have the existence of {εn }∞ n=1 which satisfies the condition of Theorem 7. Then Theorem 7 implies that we can construct the codes with an arbitrary small block error probability from a combination of the proposed algorithm and linear error correcting codes.

In this section, we determine the achievable rate region for source coding with an arbitrary small decoding error rate. First, we determine the achievable rate region for the multi-terminal source coding. The achievable rate region for the multi-terminal source coding with an arbitrary small block error probability is described by the following lemma. Lemma 13 ([46]). Let RSW,prob be the achievable rate region for the multi-terminal source coding with an arbitrary small block error probability. Then    RX ≥ H(X|Z),         R ≥ H(Z|X), RSW,prob =  , R ) : (R .  Z X Z         RX + RZ ≥ H(XZ) We prove the following theorem. Theorem 8. Let RSW,d.e.r. be the achievable rate region of the codes with an arbitrary small decoding error rate. Then, RSW,d.e.r. = RSW,prob . Proof. The fact that RSW,d.e.r ⊃ RSW,prob can be proved from Theorem 1 and Lemma 13. Hence, we focus on the proof of RSW,d.e.r ⊂ RSW,prob . Let RX and RZ be the encoding rates of the encoder for X and Z, respectively. We prove RSW,d.e.r. ⊂ RSW,prob by contradiction. Assume the assertion were false. Then there c . is (RX , RZ ) ∈ RSW,d.e.r. ∩ RSW,prob We make the same arguments as the proof of Theorem 0 < ε < 1/2, and Nn ≡ # "7. Let |X| be a prime number, n log2 |X|/[log2 |X| − h(ε)] . Then we have limn→∞ Nn = ∞. From Lemma 12, there is a systematic (Nn , n, dmin ) linear code such that h−1 ([1 − n/Nn ] log2 |X|) < dmin /Nn . We construct the concatenated code in the same way as that in the proof of Theorem 7. We have RX + h(ε)/[log2 |X| − h(ε)]. From (RX , RZ ) ∈ RSW,d.e.r. , we have limn→∞ d(xn , zn )/n = 0, almost surely. Then there is n(x) ∈ N such that d(xn , zn ) < nε < dmin /2 for all n > n(x), almost surely. This implies that the block error probability of the concatenated code converges to 0. Similarly, we can construct the encoder of Z, whose block error probability of the concatenated code converges to 0, at a rate of RX + h(ε)/[log2 |X| − h(ε)]. Since ε > 0 is arbitrary and limε→0 h(ε) = 0, we have (RX , RZ ) ∈ RSW,prob . This contradicts the fact that (RX , RZ ) ∈ c c . Therefore, RSW,d.e.r. ∩RSW,prob is an empty set. This RSW,prob  implies that RSW,d.e.r. ⊂ RSW,prob . From the above theorem and Theorem 1, the algorithm proposed in Sect. 3 can achieve asymptotically optimal coding rates. Next, we determine the achievable rate region for the source coding with partial side information at the decoder. The achievable rate region for the source coding with partial

MURAMATSU: SOURCE CODING ALGORITHMS USING THE RANDOMNESS OF A PAST SEQUENCE

1083

side information at the decoder with an arbitrary small error probability is described by the following lemma. Lemma 14 ([1, Theorem 2]). Let RPSI,prob be the achievable rate region for the source coding with partial side information at the decoder with an arbitrary small block error probability. Then     RX ≥ H(X|Z),         R ≥ I(Y; Z), , R ) : RPSI,prob =  (R .  Y X Y         pXYZ satisfies (16)–(18) We have the following theorem on the rate region of source coding with an arbitrary small decoding error rate. From the following theorem and Theorem 4, the algorithm proposed in Sect. 5 can achieve asymptotically optimal coding rates. Theorem 9. Let RPSI,d.e.r. be the achievable rate region of the codes with an arbitrary small decoding error rate. Then, RPSI,d.e.r. = RPSI,prob . We can prove the theorem in the same way as in the proof of the above theorem and the proof is omitted.

Jun Muramatsu received the B.S. and M.S. degrees, both in mathematics, from the University of Nagoya, Nagoya, Japan, in 1990 and 1992, respectively. He received the Dr. degrees from the University of Nagoya, Nagoya, Japan, in 1998. He joined NTT, Japan, in 1992. He has been engaged in research on information theory. He is currently a Research Scientist in NTT Communication Science Laboratories, NTT Corporation. He is a member of the SITA of Japan, and the IEEE.

Suggest Documents