Almost Sure Convergence Coding Theorems of One-Shot and Multi ...

IEICE TRANS. FUNDAMENTALS, VOL.E98–A, NO.12 DECEMBER 2015

2393

PAPER

Special Section on Information Theory and Its Applications

Almost Sure Convergence Coding Theorems of One-Shot and Multi-Shot Tunstall Codes for Stationary Memoryless Sources∗ Mitsuharu ARIMURA:a) , Senior Member

SUMMARY Almost sure convergence coding theorems of one-shot and multi-shot Tunstall codes are proved for stationary memoryless sources. Coding theorem of one-shot Tunstall code is proved in the case that the leaf count of Tunstall tree increases. On the other hand, coding theorem is proved for multi-shot Tunstall code with increasing parsing count, under the assumption that the Tunstall tree grows as the parsing proceeds. In this result, it is clarified that the theorem for the one-shot Tunstall code is not a corollary of the theorem for the multi-shot Tunstall code. In the case of the multi-shot Tunstall code, it can be regarded that the coding theorem is proved for the sequential algorithm such that parsing and coding are processed repeatedly. Cartesian concatenation of trees and geometric mean of the leaf counts of trees are newly introduced, which play crucial roles in the analyses of multi-shot Tunstall code. key words: lossless data compression, VF code, Tunstall code, almost sure convergence, Cartesian concatenation, geometric mean

1. Introduction Asymptotic coding rates of one-shot and multi-shot Tunstall codes are investigated in non-universal situation so that the probability distribution of a source is known to the encoder and the decoder. Tunstall code [1] is the original variableto-fixed length (VF) source code for stationary memoryless sources and is optimal in the sense that it attains the maximum average block length in all the VF parsing trees with the same leaf counts. Assume that Tunstall encoder and decoder have identical Tunstall trees. The encoder maps a variable-length block, corresponding to the path from the root node to a leaf of the Tunstall tree, to a fixed-length codeword, and the decoder maps the codeword to the original variable-length block. In this paper, this procedure is called a one-shot Tunstall code. On the other hand, a multi-shot Tunstall code parses a given source sequence to two or more variablelength blocks and encodes each of them to a fixed-length codeword. By concatenating the codewords, the codeword for the concatenation of all the parsed blocks is obtained. If the parsing count is fixed in the multi-shot Tunstall code, the total coding can also be regarded as a VF code. Regardless of the optimality of the one-shot Tunstall code, the analysis of the multi-shot Tunstall code is meanManuscript received February 11, 2015. Manuscript revised July 3, 2015. : The author is with the Department of Applied Computer Sciences, Shonan Institute of Technology, Fujisawa-shi, 251-8511 Japan. ∗ This paper was partly presented at 2014 International Symposium on Information Theory and its Applications (ISITA2014). a) E-mail: [email protected] DOI: 10.1587/transfun.E98.A.2393

ingful because of the following reason. Assume that the leaf count of the one-shot Tunstall tree is equal to the multiplication of the leaf counts of the Tunstall trees used at all the parsing steps of the multi-shot Tunstall code. This assumption means that the codeword lengths of the two codes are roughly the same. In this situation, the coding performance of the multi-shot Tunstall code may be worse than that of the one-shot Tunstall code. This is because that if we regard the total multi-shot Tunstall code as a one-shot VF code, it may not be the same as an optimal one-shot Tunstall code. On the other hand, if the leaf count of Tunstall tree becomes larger, a memory usage problem occurs, because the memory usage scales linearly with the leaf count of the Tunstall tree. Therefore, it is useful to make small Tunstall trees and parse repeatedly when using the computer environment with restricted size of memory. The analysis of multi-shot Tunstall code is meaningful for such practical situations. Many results [1]–[14] have been obtained for coding rate of one-shot and multi-shot VF codes. However, almost sure convergence coding theorem of neither one-shot nor multi-shot Tunstall code has been proved yet. This paper gives almost-sure convergence coding theorems for both of one-shot and multi-shot Tunstall codes. This paper is characterized by the following two points. First, Cartesian concatenation of trees and geometric mean of the leaf counts of trees are newly introduced. These two notions on multiple trees play crucial roles in the analyses of multi-shot Tunstall codes. Second, this paper clarifies that the coding theorem for the one-shot Tunstall code is not a corollary of that for the multi-shot Tunstall code. This is because that, in the former code the limit is taken for the leaf count of one Tunstall tree, and in the latter code the limit is taken for the total parsing count and the leaf count of the Tunstall tree grows as the parsing proceeds. This difference affects the strategy of the proof. From this reason, we need to investigate both of one-shot and multi-shot Tunstall codes. This paper is organized as follows. In Sect. 2, some definitions on the information source and the algorithms of one-shot and multi-shot Tunstall codes are given. Sections 3 and 4 give some key results of this paper. Section 3 introduces Cartesian concatenation of trees and geometric mean of the leaf counts of trees. Section 4 gives a result on the probability of the blocks parsed in the second or later parsing step in multi-shot Tunstall code. Using these results, upper and lower bounds of the pointwise redundancy are evaluated in Sect. 5, and Sect. 6 gives probabilistic bounds of the coding rate. Sections 7 and 8 show the almost sure conver-

c 2015 The Institute of Electronics, Information and Communication Engineers Copyright


2394

gence coding theorems of one-shot and multi-shot Tunstall codes, respectively. 2. One-Shot and Multi-Shot Tunstall Codes 2.1 Notations For any random variable X, the probability distribution of X is represented by PX . The sequences x1 ¨ ¨ ¨ xn and X1 ¨ ¨ ¨ Xn are denoted by x1n and X1n , and xij represents a substring xi xi`1 ¨ ¨ ¨ x j of x1n for i, j satisfying 1 ď i ď j ď n. An infinite sequence starting with xi is denoted by xi8 . The length of a finite string w is written as }w}, and the cardinality of a finite set X is represented by |X|. The union of all the finite Cartesian products of X is written as X˚ “ Yně1 Xn , and the set of all the infinite sequences x18 is denoted by X8 . The bases of log and exp are assumed to be 2 in this paper. Let X “ X18 be a stationary memoryless information source such that X1 “ X. Each Xi takes values in a finite alphabet X p|X| “ A ă 8q with probability distribution PXi “ PX . The entropy rate HpXq of the source X is equal to the entropy of the probability distribution PX , which is defined by HpXq “ HpPX q “

ÿ

PX pxq log

xPX

1 . PX pxq

The minimum and the maximum probabilities of PX are written as P “ min PX pxq, xPX

P “ max PX pxq. xPX

The probability of a variable length sequence w “ x1 x2 ¨ ¨ ¨ x}w} P X˚ is denoted by P˚ pwq. Since the source X is stationary memoryless, it holds that P˚ pwq “

}w} ź

PX pxi q.

(1)

i“1

Subsequently, some notations on parsing tree for VF coding is described. Let T be a rooted A-ary tree such that each inner node of T has A child nodes. Each edge has a label x P X. The leaf set of T , which is defined as the set of nodes having no child, is written as T . Each leaf node ν P T can be represented by a block wpνq, which is a concatenation of the labels of all the edges in the path from the root node to ν in this order. The notation T is used also for the set twpνq : ν P T u. The probability of a leaf ν P T is defined as the probability P˚ pwpνqq. The depth of the leaf ν can be represented by }wpνq}. The minimum and the maximum of depth of T are defined by wpT q “ min }wpνq}, νPT

wpT q “ max }wpνq}. νPT

2.2 Tunstall Algorithm When a stationary memoryless source X is given, a Tunstall

parsing tree (or Tunstall tree) is created as follows. 1. At the step 1, let T p1q be the A-ary tree of depth one. 2. At the step mpě 2q, T pmq is created from T pm ´ 1q as follows (this step is referred as an extension of the Tunstall tree). a. Take any leaf ν P T pm ´ 1q which has the largest probability P˚ pwpνqq in the set of leaf nodes. b. Let T pmq be the tree created by attaching the root node of a copy of T p1q to ν. This algorithm is referred as the Tunstall algorithm and the sequence tT pmqumě1 is referred as the Tunstall tree extension sequence. The tree T pmq has m inner nodes and |T pmq| “ pA ´ 1qm ` 1 leaf nodes. In this paper, it is assumed that the encoder and the decoder know the probability distribution PX of the information source X. On the other hand, in practical situations, the probability distribution PX is estimated by the empirical distribution, and it is used to create Tunstall tree. 2.3 One-Shot Tunstall Code If a Tunstall tree T1 pmq is given, the block wpνm q corresponding to a leaf νm P T 1 pmq can be encoded in rlog |T 1 pmq|s bits by the fixed-to-fixed length lossless code, and can be decoded without error. This code is called a oneshot Tunstall code. In this paper, an arbitrary sequence of Tunstall trees tT1 pmi quiě1 is considered. Each tree T1 pmi q has mi inner nodes and mi becomes larger as i increases. Given an infinite sequence x18 , a prefix wpνmi q, νmi P T 1 pmi q is parsed using the tree T1 pmi q. Since this block with }wpνmi q} symbols is encoded in rlog |T 1 pmi q|s bits, the coding rate is defined by rlog |T 1 pmi q|s . }wpνmi q} The asymptotic behavior of the above coding rate for tT1 pmi quiě1 as i Ñ 8 is investigated, and the result is applied to the Tunstall tree extension sequence tT1 pmqumě1 of the one-shot Tunstall code. 2.4 Multi-Shot Tunstall Code A multi-shot Tunstall code, which parses K blocks from a source sequence x18 and encodes each of them, are defined as follows. 1. Construct K Tunstall trees T1 pm1 q, . . . , Tk pmk q, . . . , TK pmK q using the Tunstall algorithm. k 2. At the step k “ 1, . . . , K, the block wk “ xnnk´1 `1 pn0 “ 8 0q is parsed from x1 as follows. k a. Let wk “ xnnk´1 `1 be the prefix of the sequence 8 xnk´1 `1 equivalent to the block wpνq correspond-

ing to some leaf node ν P T k pmk q of the tree

ARIMURA: ALMOST SURE CONVERGENCE CODING THEOREMS OF ONE-SHOT AND MULTI-SHOT TUNSTALL CODES

2395

Tk pmk q. k b. Parse the prefix wk “ xnnk´1 `1 from the sequence 8 xnk´1 `1 . In the following analyses, since the valuable quantity of a tree Tk pmk q is not the internal node count mk but the leaf count |T k pmk q|, the tree of the step k may be written as Tk by omitting mk from Tk pmk q, especially for the leaf count |T k |. When K blocks w1 , . . . , wK are parsed from a given source sequence x18 using the Tunstall trees T1 pm1 q, . . . , TK pmK q, respectively, the random variable corresponding to the block wk is written as Wk . The sequence of blocks w1 , . . . , wK and the corresponding random variable are written as w1K “ tw1 , . . . , wK u,

W1K “ tW1 , . . . , WK u.

Overall parsed block which is the concatenation of K blocks w1 , . . . , wK and the corresponding random variable are written as w1¨¨¨K “ w1 ¨ ¨ ¨ wK ,

W1¨¨¨K “ W1 ¨ ¨ ¨ WK .

The coding rate of the multi-shot Tunstall code is defined as follows. The sum of the lengths of the blocks parsed from x18 and the sum of the codeword lengths are K ÿ

}wk pνmk q}

k“1

and

K ÿ

rlog |T k pmk q|s,

k“1

from which the coding rate of the multi-shot Tunstall code is defined by ˆÿ K

˙Nˆ ÿ ˙ K rlog |T k pmk q|s }wk pνmk q} .

k“1

(2)

T3 , Cartesian concatenation of K trees T1K “ tT1 , . . . , TK u can be defined by T1¨¨¨K “

K ą

Tk “ T1 ˆ T2 ˆ ¨ ¨ ¨ ˆ TK .

k“1

If the leaf counts of the trees T1 and T2 are |T 1 | and |T 2 |, respectively, the leaf count of the tree T1 ˆ T2 is equal to |T 1 | ¨ |T 2 |. Moreover, the leaf count |T 1¨¨¨K | of the tree T1¨¨¨K satisfies |T 1¨¨¨K | “

K ź

|T k |.

k“1

For the shape of the tree, commutative law T1 ˆ T2 “ T2 ˆT1 as a binary operation does not hold generally. Therefore, when we pay attention to the shape of the trees, the sequential order of trees is meaningful in the definition of the Cartesian concatenation. On the other hand, when we concentrate on the leaf counts, there is no necessity to mind the sequential order of trees. For the Cartesian concatenation T1¨¨¨K , the following property holds. Lemma 1: The block created by the concatenation of the blocks parsed by the multi-shot Tunstall algorithm using the Tunstall trees T1 , . . . , TK is equivalent to the block parsed by the one-shot parsing using the Cartesian-concatenated tree T1¨¨¨K . This lemma can be easily verified by noting that, the leaf set of the Caetesian-concatenated tree T1¨¨¨K corresponds to the set of the blocks created by concatenating the variable-length blocks w1 , . . . , wK , which are the result of the parsing of x18 using K Tunstall trees T1 , . . . , TK in this order.

k“1

For the multi-shot Tunstall code, the asymptotic behavior of this coding rate as K Ñ 8 is investigated. 3. Cartesian Concatenation of Trees and Geometric Mean of Leaf Counts As key properties for study of the multi-shot Tunstall code, Cartesian concatenation of trees and geometric mean of the leaf counts of trees are newly introduced in this section. 3.1 Cartesian Concatenation of Trees In this subsection, the definition of Cartesian concatenation of two or more trees is given. Definition 1 (Cartesian concatenation of trees): The Cartesian concatenation of trees T1 and T2 is defined as a tree created by attaching the root of the trees being equivalent to T2 , to all the leaves of T1 and denoted by T1 ˆT2 . Moreover, since this binary operation satisfies associative law such that pT1 ˆ T2 q ˆ T3 “ T1 ˆ pT2 ˆ T3 q for three trees T1 , T2 and

3.2 Geometric Mean of the Leaf Counts of Trees Next we introduce another key quantity related to multi-shot Tunstall code, which is geometric mean of the leaf counts of trees. Definition 2 (geometric mean of the leaf counts of trees): For a sequence of K trees T1K “ tT1 , . . . , TK u, the geometric mean of the leaf counts of trees is defined by ˆź ˙ K1 K 1 gpT1K q “ |T 1¨¨¨K | K “ |T k | . k“1

3.3 Examples In this subsection, some examples of the Cartesian concatenation of trees and the geometric mean of the leaf counts of trees are shown. Especially, it is verified that the Cartesian concatenation satisfies associative law but does not satisfy commutative law in the examples. Example 1 (Commutative Law): Let T1 and T2 are trees


2396

mean of the leaf counts of the trees grows in roughly the same order of the leaf count of each tree of the sequence.

Fig. 1

Cartesian concatenation of trees and commutative law.

Consider Example 3 (In the case of |T k pmk q| “ Op1q.): the multi-shot Tunstall coding such that the Tunstall tree used in the k-th parsing satisfies |T k pmk q| “ |T 1 pm1 q|. This setting corresponds to the procedure that, after one Tunstall tree T1 pm1 q with leaf count |T 1 pm1 q| is created, the source sequence is parsed repeatedly in K times using T1 pm1 q. In this case, the geometric mean is equal to the leaf count of the first tree as gpT1K q “ |T 1 pm1 q| “ Op1q. Consider Example 4 (In the case of |T k pmk q| “ Opkq.): the multi-shot A-ary Tunstall coding with Tk pmk q “ Tk pkq, such that the Tunstall tree is extended once after each parsing is done. In this case, from |T k pmk q| “ kpA ´ 1q ` 1, we have 1 1 upper and lower bounds pA´1qpK!q K ď gpT1K q ď ApK!q K . Using the Stirling’s approximation (see [22]) n log n ´ n log e ď log n! ď n log n,

(3)

it holds that log KpA ´ 1q ´ log e ď log gpT1K q ď log KA. Therefore, the growing speed of the geometric mean is gpT1K q “ OpKq.

Fig. 2

Associative law of Cartesian concatenation.

with leaf sets T 1 “ t0, 10, 11u and T 2 “ t0, 1u, respectively. Then |T 1 | “ 3 and |T 2 | “ 2. As shown in Fig. 1, the leaf set of the tree T12 “ T1 ˆ T2 is T 12 “ t00, 01, 100, 101, 110, 111u and the leaf set of the tree T21 “ T2 ˆ T1 is T 21 “ t00, 010, 011, 10, 110, 111u. Therefore, it holds that T 12 ‰ T 21 , which means that the Cartesian concatenation does not satisfy commutative law. On the other hand, the leaf counts of the Cartesian-concatenated trees and the geometric means satisfy |T 12 | “ |T 21 | “ |T 1 | ¨ |T 2 | “ 6, ? gptT1 , T2 uq “ gptT2 , T1 uq “ 6. Example 2 (Associative Law): In addition to the previous example, define T 3 “ t0, 1u. Then, as shown in Fig. 2, we can verify associative law pT1 ˆT2 qˆT3?“ T1 ˆpT2 ˆT3 q. 3 In this case, |T 1¨¨¨3 | “ 12 and gpT13 q “ 12. Next, some examples of the geometric mean of the leaf counts are shown for the cases of several different patterns of the growing speed of the Tunstall trees. As shown in the examples, if a sequence of trees is given, the geometric

Example 5 (In the case of |T k pmk q| “ OpAk q.): Consider the multi-shot A-ary Tunstall algorithm, such that the codeword length for the k-th parsed block satisfies rlog |T k pmk q|s “ k for k ě rlog As, which means that the codeword length increases in the linear order as the parsing proceeds. This case corresponds to the A-ary multi-shot Tunstall algorithm such that after each parsing the Tunstall tree is extended in the same times as the leaf count. In this situation, |T k pmk q| “ Ak implies that gpT1K q “ K`1 K A 2 “ OpA 2 q, which concludes that the geometric mean also grows in the exponential order. 4. Probability Assigned to the Parsed Blocks In this section, the probability assigned to the blocks parsed in the second or later step is studied. As we have seen in Lemma 1, the block created by concatenating the blocks parsed using the Tunstall trees T1 , . . . , TK is equivalent to the block parsed by the Cartesian concatenation T1¨¨¨K . This means that the starting point of the block parsed in the step 2 varies depending on the block parsed in the step 1, and is corresponding to the leaf node ν P T 1 of the first parsing tree to which the second parsing tree T2 is attached. In the following, it is shown that for k ě 2, if all the Tunstall trees T1 , . . . , Tk´1 are prepared before parsing starts, the probability assigned to any block corresponding to the leaf of the tree Tk is independent of the attached position of the root node of Tk in the leaf set of T1¨¨¨k´1 . Let T k´1 be a random variable defined by


2397

Note 1: The property shown in this section is equivalent to the strong Markov property related to the stopping time of the Markov chain, discussed in the scenario of renewal theory [19], where a random variable T k´1 is called stopping time or Markov time if the event tT k´1 “ tu is a function of X1t . This property is dependent of the parsing algorithm. For example, in the LZW algorithm [16], the parsing position T k´1 is not a stopping time because T k´1 “ t is determined by not only x1t but also the next symbol xt`1 . We cannot directly apply the result of this section to such algorithms.

Fig. 3

T k´1 “

5. Upper and Lower Bounds of Pointwise Redundancy

Attachment position of the root of Tk to T1¨¨¨k´1 .

k´1 ÿ

}Wi },

i“1

which represents the sum of the block lengths of the steps 1, . . . , k ´ 1. Moreover, if }W1¨¨¨k´1 } “ t is satisfied, then it holds that W1¨¨¨k´1 “ X1t . Under the above assumption, the following property holds for the probability assigned to the block wk . Lemma 2: For the probability assigned to the block T k´1 ` xTk´1 `1 of any length ě 1 starting at the position T k´1 ` 1, it holds that ˇ ! ) ! ) T k´1 ` k´1 ˇ k´1 “ X Pr XT k´1 “ x “ w Pr “ x ˇW 1 1 1 `1 1 1 for any x1 , k ě 2 and wk´1 1 , which means that the probability related to the tree Tk used for the parsing of the step k is independent of the starting point T k´1 ` 1 and the blocks w1 , . . . , wk´1 created in the previous parsing. Proof of Lemma 2 is given in Appendix A. Let us consider the probability related to Tk , k ě 2 in the left tree of Fig. 3. From Lemma 2, the probability of w P T k for Tk starting at any point is equivalent to the probability of w P T k for Tk beginning with x1 shown in the right tree of Fig. 3. Now we can define the probability of the blocks parsed at the second or later step. For any w P T k let P˚k pw|wk´1 1 q k´1 denote the probability of w conditioned by w1 , wi P T i , i “ 1, . . . , k ´ 1. Then it holds from Lemma 2 that ˚ P˚k pw|wk´1 1 q “ P pwq,

(4)

for any w and wk´1 1 , where the right-hand side is defined in (1). Therefore, the probability of a block w P T k is written as P˚k pwq which is equivalent to P˚ pwq. For a tree Tk , the minimum and the maximum probabilities of the blocks w “ wpνq corresponding to some leaf ν P T k are represented by P˚ pTk q “ min P˚k pwq, w“wpνq, νPT k

P˚ pTk q “ max P˚k pwq. w“wpνq, νPT k

In this section, upper and lower bounds of the pointwise redundancy are investigated using the results obtained in Sections 3 and 4. The length and the probability of w1¨¨¨K , which is the concatenation of the blocks w1K parsed by K Tunstall trees T1 , . . . , TK , are represented by }w1¨¨¨K } and P˚ pw1¨¨¨K q, respectively. These values satisfy }w1¨¨¨K } “

K ÿ

}wk },

P˚ pw1¨¨¨K q “

k“1

K ź

P˚k pwk q.

k“1

Note that the latter one comes from (1) and (4). In the following, pointwise redundancy of the multishot Tunstall code for a sequence w1¨¨¨K with respect to a source X is defined as the difference between the total coding rate for the parsed blocks w1K and symbol-wise self information of w1¨¨¨K . When blocks w1 , . . . , wK are parsed sequentially using the Tunstall trees T1 , ¨ ¨ ¨ , TK , codeword length for each block is rlog |T k |s bits and the total coding rate is represented by (2), which is equal to 1

K ÿ

rlog |T k |s. }w1¨¨¨K } k“1 On the other hand, symbol-wise self information of a sequence w1¨¨¨K with respect to the information source X is defined by 1 1 log ˚ . }w1¨¨¨K } P pw1¨¨¨K q Using these values, pointwise redundancy is defined as follows. Definition 3 (Pointwise Redundancy): If K blocks w1 , . . . , wK are parsed from x18 using K Tunstall trees T1 , ¨ ¨ ¨ , TK , the pointwise redundancy with respect to the source X is defined by rpw1K , T1K , Xq "ÿ K

* 1 . “ rlog |T k |s ´ log ˚ }w1¨¨¨K } k“1 P pw1¨¨¨K q 1


2398

The next theorem shows upper and lower bounds of the pointwise redundancy. Theorem 1: Pointwise redundancy of the multi-shot Tunstall code is bounded by max rpw1K , T1K , Xq ď w1K

min rpw1K , T1K , w1K

Xq ě

´ p1 ´ log Pq log P log gpT1K q ` log P ´ plog Pq log P log gpT1K q ´ log P

,

,

Proof of Theorem 1 is given in Appendix B. If we consider the situation of Example 3, the following corollary holds. Corollary 1: The pointwise redundancy of the multi-shot Tunstall code using T11 “ tT1 , . . . , T1 u, which is repetition of a single Tunstall tree T1 in K times, is bounded as follows. w1K

rpw1K , T11 , Xq ě min K w1

´ p1 ´ log Pq log P log |T 1 | ` log P ´ plog Pq log P log |T 1 | ´ log P

and hpxq is binary entropy function defined by hpxq “ ´x log x ´ p1 ´ xq logp1 ´ xq.

where gpT1K q is the geometric mean of the leaf counts K t|T k |uk“1 defined in Definition 2. Note that the maximum and the minimum with respect to w1K are taken for all the combinations of the blocks wk “ wk pνq, ν P T k for k “ 1, 2, . . . , K, when the trees T1 , . . . , TK are given.

max rpw1K , T11 , Xq ď

function of f pδq defined by d ˜d ¸ δ δ logpA ´ 1q ` h , f pδq “ δ ` 2 log e 2 log e (6)

,

.

Setting K “ 1 in the above corollary, redundancy bounds for one-shot Tunstall code same as [14] is obtained. Therefore the result in this section can be regarded as an extension of the result obtained in [14]. 6. Probabilistic Bound of Coding Rate In this section, upper bounds of the non-typical set probabilities for self information and coding rate are given. Assume that K blocks W1 , . . . , WK are parsed from X18 using K Tunstall trees T1 , . . . , TK . First, a bound for the probability concerning the self information is given. Lemma 3: For the self information of the concatenation W1¨¨¨K of the blocks W1 , . . . , WK , which are parsed from X18 using K Tunstall trees T1 , . . . , TK , it holds that ˇ "ˇ * ˇ ˇ 1 1 log ˚ ´ HpXqˇˇ ą δ Pr ˇˇ }W1¨¨¨K } P pW1¨¨¨K q 1 ˙ ˆ ˙ ´1 ˆ K log gpT1 q ´ log P A`1 gpT1K q ´ log P ´ f pδqK ď K 2 ´ log P ¸ ˜ K K A`1 tlog gpT1 q u , (5) “O ´1 gpT1K q f pδqK where δ ą 0 is arbitrary small and f ´1 pδq is the inverse

(7)

Proof of Lemma 3 is given in Appendix C. Since a bound of pointwise redundancy is given in Theorem 1, and a bound of the difference between the self information and the entropy is given in Lemma 3, a probabilistic bound of the coding rate is induced from them. Theorem 2: If K blocks W1 , . . . , WK are parsed from X18 using the Tunstall trees T1 , . . . , TK , the probability of too large or too small coding rate is bounded as ˇ #ˇ K ˇ ˇ ÿ 1 ˇ ˇ Pr ˇ rlog |T k |s ´ HpXqˇ ˇ ˇ }W1¨¨¨K } k“1 + ´ p1 ´ log Pq log P ą `δ log gpT1K q ` log P 1 ˙ ˆ ˙ ´1 ˆ log gpT1K q ´ log P A`1 gpT1K q ´ log P ´ f pδqK ď K 2 ´ log P ¸ ˜ tlog gpT1K qK uA`1 , (8) “O ´1 gpT1K q f pδqK where δ is arbitrary small and f ´1 pδq is the inverse function of f pδq defined by (6). Proof of Theorem 2 is given in Appendix D. Setting K “ 1, we have the result for one-shot Tunstall code. Corollary 2: ˇ #ˇ + ˇ ˇ rlog |T |s ´ p1 ´ log Pq log P ˇ ˇ 1 ´ HpXqˇ ą Pr ˇ `δ ˇ ˇ }W1 } log |T 1 | ` log P 1 ˙ ˆ ˙ ´1 ˆ log |T 1 | ´ log P A`1 |T 1 | ´ log P ´ f pδq ď 2 ´ log P ˜ ¸ A`1 plog |T 1 |q “O , (9) ´1 |T 1 | f pδq where δ ą 0 is arbitrary small and f ´1 pδq is the inverse function of f pδq defined by (6). From Theorem 2 and Corollary 2, we can directly show convergence in probability of the coding rates of one-shot and multi-shot Tunstall codes to the entropy rate of the stationary memoryless source, respectively. 7. Almost Sure Convergence Coding Theorem of OneShot Tunstall Code In this section, an almost sure convergence of the coding


2399

rate of one-shot Tunstall code is proved for the situation that the leaf count of the Tunstall tree increases. In Theorem 4, an almost sure convergence coding theorem for the Tunstall tree extension sequence is proved. To prove the theorem, an almost sure convergence is proved in Theorem 3 for any sequence of the Tunstall trees such that the leaf count increases in exponential order of the index i. The following Borel-Cantelli Lemma is used in the proof. Lemma 4 (See Shields [20]): If tAn u is a sequence of measurable sets in a probability space pX8 , F , Pq such that ř 8 8 n“1 PpAn q ă 8 then for almost every x1 there is an N “ Npx18 q such that x18 R An , n ě N.

´

(10)

for some ε0 ą 0, then it holds that rlog |T 1 pmi q|s lim “ HpXq almost surely. iÑ8 }wpνmi q} Note 2: The condition (10) means that |T 1 pmi q| grows in the exponential order of i, because (10) can be rewritten to the sentence that, for any δ ą 0, there exists I “ Ipδq such that for any i ą I, 1i log |T 1 pmi q| ą ε0 ´ δ. Since δ is arbitrary, by setting δ “ ε0 {2, we have |T 1 pmi q| ą exppε0 i{2q. Proof : Fix an arbitrary small δ ą 0. For δ, define probabilities P1 pi, δq and P2 pi, δq as follows. ˇ + #ˇ ˇ ˇ rlog |T pm q|s ˇ ˇ 1 i P1 pi, δq “ Pr ˇ ´ HpXqˇ ą 2δ , ˇ ˇ }W1 } ˇ #ˇ ˇ ˇ rlog |T pm q|s ˇ ˇ 1 i P2 pi, δq “ Pr ˇ ´ HpXqˇ ˇ ˇ }W1 } + ´ p1 ´ log Pq log P ą `δ . log |T 1 pmi q| ` log P Then, since it holds from the assumption (10) that lim sup iÑ8

´ p1 ´ log Pq log P log |T 1 pmi q| ` log P

log

log |T 1 pmi q|

lim inf iÑ8

1 log |T 1 pmi q|

“ 0,

1 1 1 log ě P1 pi, δq P2 pi, δq log |T 1 pmi q| ´1 f pδq

log |T 1 pmi q| f ´1 pδq ´ ě ´ log P log |T 1 pmi q| ˙ ˆ |T 1 pmi q| pA ` 1q log log ´ P log |T 1 pmi q|

1 ´ log P

.

log

f ´1 pδq 1 ě . P1 pi, δq ´ log P

This means that for any ε ą 0 there exists i1 “ i1 pεq, such that for any i ą maxti0 , i1 u, 1 log |T 1 pmi q|

log

f ´1 pδq 1 ě ´ ε, ´ log P P1 pi, δq

or equivalently, " ˆ ´1 ˙ * f pδq P1 pi, δq ď exp ´ ´ ε log |T 1 pmi q| . ´ log P (11) On the other hand, (10) means from Note 2 that for some ε1 ą 0 there exists i2 “ i2 pε1 q such that for i ą i2 , log |T 1 pmi q| ą ε1 i

(12)

is satisfied. Applying (12) to (11), we have for any ε ą 0, any i ą maxti0 , i1 , i2 u and some ε1 ą 0 that " ˆ ´1 ˙ * f pδq ´ ε ε1 i . P1 pi, δq ď exp ´ ´ log P f ´1 pδq

Taking ε arbitrary small, it holds that ε2 “ p ´ log P ´ εqε1 ą 0, which implies that P1 pi, δq ď expp´ε2 iq for i ą I “ maxti0 , i1 , i2 u. If we define Ai as ˇ ˇ + # ˇ ˇ ˇ 8 8 ˇ rlog |T 1 pmi q|s ´ HpXqˇ ą 2δ , Ai “ x1 P X : ˇ ˇ ˇ }w1 pνmi q} then 8 ÿ

PX18 pAi q “

i“1

I ÿ

P1 pi, δq `

i“1

ď

I ÿ

8 ÿ

P1 pi, δq

i“I`1

1`

i“1

“I`

P1 pi, δq ď P2 pi, δq for sufficiently large i ą i0 . We have the following bound for i ą i0 from (9). 1

log

Taking the limit inferior, we have from (10) that

Theorem 3: In one-shot Tunstall code a block is parsed using the Tunstall tree T1 pmi q with mi internal nodes. If the leaf count |T 1 pmi q| increases satisfying 1 lim inf log |T 1 pmi q| ě ε0 iÑ8 i

pA ` 1q

8 ÿ

expp´ε2 iq

i“I`1 2

expt´ε pI ` 1qu ă8 1 ´ expp´ε2 q

is satisfied. Applying Lemma 4, it holds that " Pr x18 P X8 : DI 1 “ I 1 px18 q, @i ě I 1 , ˇ ˇ * ˇ ˇ rlog |T pm q|s ˇ ˇ 1 i ´ HpXqˇ ď 2δ “ 1 ˇ ˇ ˇ }w1 pνmi q} for any δ ą 0. Since δ can be arbitrarily small, we have ˇ ˇ ˇ ˇ rlog |T pm q|s ˇ ˇ 1 i lim sup ˇ ´ HpXqˇ “ 0 almost surely. ˇ ˇ }w1 pνmi q} iÑ8


2400

Since HpXq is constant, the theorem is proved.

Q.E.D.

Theorem 4: In one-shot Tunstall code a block wpνm q is parsed using the Tunstall tree T1 pmq with m internal nodes. Then, for the Tunstall tree extension sequence tT1 pmqumě1 , it holds that rlog |T 1 pmq|s “ HpXq almost surely. mÑ8 }wpνm q} lim

Proof : An exact sequence tmi u for i ě i1 “ rlog As is defined. Let mi “ 1 for i “ i1 . For i ą i1 , define mi such that it satisfies i ´ 1 “ rlog |T 1 pmi ´ 1q|s ă rlog |T 1 pmi q|s “ i. (13) Since the sequence tmi uiěi1 is strictly monotone increasing, the sequence tT1 pmi quiěi1 is a subsequence of the Tunstall tree extension sequence tT1 pmqumě1 . The subsequence tT1 pmi quiěi1 satisfy lim inf iÑ8

1 1 log |T 1 pmi q| ě lim inf prlog |T 1 pmi q|s ´ 1q iÑ8 i i i´2 ě lim inf “ 1 ą 0, (14) iÑ8 i

which meets (10). From Theorem 3, there exists a set of infinite sequence G Ă X8 with PrtGu “ 1 such that for any x18 P G, if a block wpνmi q is parsed from x18 using the Tunstall tree T1 pmi q with mi internal nodes, then for any ε ą 0 there exists i0 “ i0 pεq such that for i ą i0 pεq, rlog |T 1 pmi q|s ă HpXq ` ε. }wpνmi q}

(15)

Next the coding rate is evaluated for m P tmi ` 1, . . . , mi`1 ´1u. For these m, coding rates for blocks parsed by the Tunstall trees T1 pm ´ 1q and T1 pmq, respectively, are compared. Concerning the codeword length, from (13) it holds for such m that rlog |T 1 pmq|s “ rlog |T 1 pm ´ 1q|s “ rlog |T 1 pmi q|s. (16) Lengths of the blocks wpνm´1 q and wpνm q parsed by T1 pm ´ 1q and T1 pmq from a single individual infinite sequence x18 , respectively, are evaluated as follows. Assume that the tree T1 pmq is constructed by extending the leaf ν of the tree T1 pm ´ 1q. The leaf set T 1 pm ´ 1qztνu satisfies T 1 pm ´ 1qztνu Ă T 1 pmq because these leaves are not extended in this tree extension. For these leaves νm´1 “ νm , the block lengths satisfy }wpνm q} “ }wpνm´1 q}. On the other hand, for the extended leaves νm P T 1 pmq, since its parent node is νm´1 “ ν, it holds that }wpνm q} “ }wpνm´1 q} ` 1. Combining both cases, at the tree extension from T1 pm ´ 1q to T1 pmq, the block lengths for any single individual infinite sequence x18 satisfy

}wpνm q} ě }wpνm´1 q}.

(17)

From (16) and (17), it holds for m “ tmi ` 1, . . . , mi`1 ´ 1u that rlog |T 1 pmq|s rlog |T 1 pm ´ 1q|s ď }wpνm q} }wpνm´1 q}

(18)

for any x18 . Note that νm and νm´1 are functions of x18 . Apply (18) to (15). Then, for any x18 P G, if a block wpνm q is parsed from x18 using the Tunstall tree T1 pmq with m internal nodes, then for any ε ą 0 there exists i0 “ i0 pεq such that for any i ą maxti0 pεq, i1 u and m P tmi ` 1, . . . , mi`1 ´ 1u, it holds that rlog |T 1 pmq|s rlog |T 1 pmi q|s ď ă HpXq ` ε. }wpνm q} }wpνmi q} Combining with (15), the convergence is extended from a subsequence tmi uiěi1 to the full sequence m “ 1, 2, 3, . . ., which completes the proof. Q.E.D. Note 3: The ceiling operation in (16) is the key point of the proof. Usually in the analyses of source coding rates, the ceiling operation can be omitted because the difference of the coding rate is 1{pblock lengthq and can be neglected asymptotically [21]. However, in the proof of Theorem 4, the piecewise monotone decreasing property of the coding rate for individual sequence (18) comes from this ceiling operation. If the ceiling is omitted, (16) is replaced by log |T 1 pmq| ą log |T 1 pm ´ 1q| for m P tmi `1, . . . , mi`1 ´1u, from which we cannot induce (18). Note 4: After the proof of Theorem 4, meaning of the assumption (10) of Theorem 3 is now clear. From (13), the variable i in (10) can be regarded as the codeword length. Therefore, Theorem 3 shows asymptotics of the coding rate for the increase of the codeword length, and Theorem 4 shows asymptotics of the coding rate for the increase of the internal nodes or extension step of the Tunstall tree. 8. Almost Sure Convergence Coding Theorem of Multi-Shot Tunstall Code In this section, an almost sure convergence coding theorem is proved for multi-shot Tunstall code. Unlike the one-shot Tunstall code, Borel-Cantelli lemma directly implies the almost sure convergence of the coding rate. First a theorem is proved for any sequence of Tunstall trees satisfying a certain condition on the asymptotics of the geometric mean of the leaf counts of Tunstall trees. Theorem 5: In the multi-shot Tunstall code, the k-th block wk is parsed sequentially using a Tunstall tree Tk pmk q with mk internal nodes. If the geometric mean of the leaf counts of the Tunstall trees gpT1K q satisfy


2401

lim inf gpT1K q “ 8,

(19)

KÑ8

# lim inf tlog gpT1K qu KÑ8

then the coding rate satisfies that K ÿ

lim

A ` 1 log logpgpT1K q{Pq ´ K log gpT1K q

rlog |T k pmk q|s

k“1 K ÿ

KÑ8

“ HpXq almost surely. }wk pνmk q}

KÑ8

Proof : Fix an arbitrary small δ ą 0. For δ, define the probabilities P3 pK, δq and P4 pK, δq as follows. ˇ + #ˇ K ˇ ˇ ÿ 1 ˇ ˇ rlog |T k |s´HpXqˇ ą 2δ , P3 pK, δq “ Pr ˇ ˇ ˇ }W1¨¨¨K } k“1 ˇ #ˇ K ˇ ˇ ÿ 1 ˇ ˇ P4 pK, δq “ Pr ˇ rlog |T k |s ´ HpXqˇ ˇ ˇ }W1¨¨¨K } k“1 + ´ p1 ´ log Pq log P ą `δ . log gpT1K q ` log P Then, since it holds from the assumption (19) that

KÑ8

´ p1 ´ log Pq log P log gpT1K q ` log P

“ 0,

1 1 1 1 log ě log K K P3 pK, δq P4 pK, δq * " ´1 f pδq A`1 log logpgpT1K q{Pq ě tlog gpT1K qu ´ ´ log P K log gpT1K q A`1 1 log K . ´ log ´ f ´1 pδq ´ pA ` 1q K K ´ log P Taking the limit inferior, we have the following inequality. lim inf KÑ8

1 1 log K P3 pK, δq #

ě ´f

´1

pδq `

lim inf tlog gpT1K qu KÑ8

f ´1 pδq ´ log P

+ A ` 1 log logpgpT1K q{Pq ´ . K log gpT1K q log logpgpT1K q{Pq log gpT1K q

Since, from the assumption (19), bitrarily small as K increases, it holds that

f ´1 pδq A ` 1 log logpgpT1K q{Pq ´ ą0 ´ log P K log gpT1K q for sufficiently large K, and therefore

becomes ar-

“ 8.

1 1 log “ 8, K P3 pK, δq

which means that for any finite positive number C ą 0, there exists K1 “ K1 pCq such that for any K ą maxtK0 , K1 u 1 it holds K1 log P3 pK,δq ě C or, equivalently, P3 pK, δq ď expt´KCu. Define AK as # AK “

x18 P X8 : ˇ ˇ + K ˇ ˇ 1 ÿ ˇ ˇ rlog |T k |s ´ HpXqˇ ą 2δ . ˇ ˇ ˇ }w1¨¨¨K } k“1

Since the probability P3 pK, δq decreases faster than in the exponential order of K for K ą K2 “ maxtK0 , K1 u, it holds that 8 ÿ

P3 pK, δq ď P4 pK, δq for sufficiently large K ą K0 . From (8), the following lower bound is obtained for K ą K0 .

+

This implies lim inf

k“1

lim sup

f ´1 pδq ´ log P

PX18 pAK q “

K“1

K2 ÿ

P3 pK, δq `

K2 ÿ

P3 pK, δq

K“K2 `1

K“1

ď

8 ÿ

1`

expt´KCu

K“K2 `1

K“1

“ K2 `

8 ÿ

expt´pK2 ` 1qCu ă 8. 1 ´ exppĆq

Applying Lemma 4, we have " Pr x18 P X8 : DK 1 “ K 1 px18 q, @K ě K 1 , ˇ ˇ * K ˇ ˇ 1 ÿ ˇ ˇ rlog |T k |s ´ HpXqˇ ď 2δ “ 1 ˇ ˇ ˇ }w1¨¨¨K } k“1 for any δ ą 0. Since δ can be arbitrarily small, it is satisfied that ˇ ˇ K ˇ ˇ 1 ÿ ˇ ˇ lim sup ˇ rlog |T k |s ´ HpXqˇ “ 0 ˇ KÑ8 ˇ }w1¨¨¨K } k“1 almost surely. Using the fact that HpXq is constant, the almost sure convergence is proved. Q.E.D. Using this theorem, an almost sure convergence coding theorem is proved for a multi-shot Tunstall code such that the k-th word wk is parsed using a Tunstall tree Tk pkq with k internal nodes.


2402

Theorem 6: In the multi-shot Tunstall code, if the k-th block wk is parsed using the Tunstall tree Tk pkq with k internal nodes, then the coding rate satisfies that K ÿ

lim

KÑ8

rlog |T k pkq|s

k“1 K ÿ

“ HpXq almost surely. }wk pνmk q}

k“1

Proof : For a source with alphabet of size A, Tunstall tree Tk pkq satisfies |T k pkq| “ kpA ´ 1q ` 1, which implies that ˜ ¸ K1 ˜ ¸ K1 K K ź ź K tkpA ´ 1q ` 1u ą kpA ´ 1q gpT1 q “ k“1

k“1 1 K

´1

“ pA ´ 1qpK!q ě pA ´ 1qe

tree is extended. On the other hand, for the multi-shot Tunstall code, parsing and encoding can be proceeded sequentially such that the Tunstall tree is extended in one step after each parsing. This is supplemental merit of the multi-shot Tunstall code in addition to the memory usage described in the introduction. Analyses of this paper are restricted to non-universal situation so that the probability of the source is known to the encoder and the decoder. A sequential implementation of the multi-shot Tunstall code mentioned above has potential to be extend to a universal code such that the Tunstall tree is adaptively extended using the empirical distribution of the symbols included in already parsed blocks. This type of universal Tunstall code may have some relationships to the class of LZ78 code [15]–[17], because LZ78 code can be interpreted as a Tunstall code with some specific probability model [23].

K,

where the last inequality comes from (3). This inequality implies (19). Applying Theorem 5, almost sure convergence is proved for multi-shot Tunstall code. Q.E.D. Note 5: In the proof of Theorem 4, we need to verify that the coding rate is piecewise monotone decreasing as (18). On the other hand, in the proof of Theorem 6, we need no such procedure. This difference comes from that, only a theorem for subsequence of the Tunstall tree extension sequence can be directly shown by Lemma 4 in the case of the one-shot Tunstall code, because the growing rate of |T 1 pmi q| and gpT1K q are different in (10) and (19). This difference goes back to the difference between (8) and (9). In (8), the probability is in exponential order of K. But in the process of rewriting the multi-shot theorem to the one-shot corollary, K is fixed to 1 and so the probability is not exponential order of K nor |T 1 pmq| in the one-shot case. This substantial difference between multi-shot and one-shot Tunstall codes affects the strategy of the proof. 9. Concluding Remarks Almost sure convergence coding theorems of one-shot and multi-shot Tunstall codes for stationary memoryless sources have been proved. In the analyses, Cartesian concatenation of the Tunstall trees and the geometric mean of the leaf counts are newly introduced and play crucial roles in the analyses of multi-shot Tunstall code. In the case of one-shot Tunstall code, the limit is taken for the increase of the count m of the internal nodes of the Tunstall tree. On the other hand, in the case of multi-shot Tunstall code, the limit is taken for the increase of the total parsing count K, and the Tunstall tree grows as the parsing position k P t1, 2, . . . , Ku increases. This difference affects the algorithm implementation. When using the oneshot Tunstall code, we must parse and encode again from the head of the source sequence every time when the Tunstall

Acknowledgments The author expresses his thanks to the associate editor and the two referees who carefully read the manuscript and gave valuable comments. This research is supported in part by MEXT Grant-in-Aid for Scientific Research(C) 24560482 and 15K06088. References [1] B.P. Tunstall, Synthesis of Noiseless Compression Codes, Ph.D. dissertation, Georgia Inst. Tech., Atlanta, GA, 1967. [2] F. Jelinek and K.S. Schneider, “On variable-length-to-block coding,” IEEE Trans. Inform. Theory, vol.18, no.6, pp.765–774, Nov. 1972. [3] T.J. Tjalkens and F.M.J. Willems, “Variable to fixed-length codes for Markov sources,” IEEE Trans. Inform. Theory, vol.33, no.2, pp.246–257, March 1987. [4] T.J. Tjalkens and F.M.J. Willems, “A universal variable-to-fixed length source code based on Lawrence’s algorithm,” IEEE Trans. Inform. Theory, vol.38, no.2, pp.247–253, March 1992. [5] F. Fabris, A. Sgarro, and R. Pauletti, “Tunstall adaptive coding and miscoding,” IEEE Trans. Inform. Theory, vol.42, no.6, pp.2167–2180, Nov. 1996. [6] S.A. Savari and R.G. Gallager, “Generalized Tunstall codes for sources with memory,” IEEE Trans. Inform. Theory, vol.43, no.2, pp.658–668, March 1997. [7] F. Fabris and A. Sgarro, “On the composition of Tunstall messages,” IEEE Trans. Inform. Theory, vol.45, no.5, pp.1608–1612, July 1999. [8] S.A. Savari, “Variable-to-fixed length codes and the conservation of entropy,” IEEE Trans. Inform. Theory, vol.45, no.5, pp.1612–1620, July 1999. [9] K. Visweswariah, S.R. Kulkarni, and S. Verdu, “Universal variableto-fixed length source codes,” IEEE Trans. Inform. Theory, vol.47, no.4, pp.1461–1472, May 2001. [10] H. Yamamoto and H. Yokoo, “Average-sense optimality and competitive optimality for almost instantaneous VF codes,” IEEE Trans. Inform. Theory, vol.47, no.6, pp.2174–2184, Sept. 2001. [11] I. Tabus and J. Rissanen, “Asymptotics of greedy algorithms for variable-to-fixed length coding of Markov sources,” IEEE Trans. Inform. Theory, vol.48, no.7, pp.2022–2035, July 2002. [12] M. Drmota, Y. Reznik, S. Savari, and W. Szpankowski, “Precise asymptotic analysis of the Tunstall code,” Proc. 2006 IEEE International Symposium on Information Theory, pp.2334–2337, July 2006.


2403

[13] M.B. Baer, “Efficient implementation of the generalized Tunstall code generation algorithm,” Proc. 2009 IEEE International Symposium on Information Theory, pp.199–203, 2009. [14] M. Arimura, “On the average coding rate of the Tunstall code for stationary and memoryless sources,” IEICE Trans. Fundamentals, vol.E93-A, no.11, pp.1904–1911, Nov. 2010. [15] J. Ziv and A. Lempel, “Compression of individual sequences via variable-rate coding,” IEEE Trans. Inform. Theory, vol.24, no.5, pp.530–536, Sept. 1978. [16] T.A. Welch, “A technique for high-performance data compression,” IEEE Computer, vol.17, no.6, pp.8–19, June 1984. [17] H. Yokoo, “An improved Ziv-Lempel coding scheme for universal source coding,” IEICE Trans. Fundamentals (Japanese Edition), vol.J68-A, no.7, pp.664–671, July 1985. [18] Z. Zhang, “Estimating mutual information via Kolmogorov distance,” IEEE Trans. Inform. Theory, vol.53, no.9, pp.3280–3282, Sept. 2007. [19] P. Billingsley, Probability and Measure, 3rd Ed., John Wiley & Sons, 1995. [20] P.C. Shields, The Ergodic Theory of Discrete Sample Paths, Graduate Studies in Mathematics, American Mathematical Society, 1996. [21] T.S. Han, Information-Spectrum Methods in Information Theory, Springer, 2003. [22] T.M. Cover and J.A. Thomas, Elements of Information Theory, 2nd Ed., John Wiley & Sons, 2006. [23] J. Rissanen, Information and Complexity in Statistical Modeling, Springer, 2007. [24] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, Cambridge University Press, UK, 2011.

Appendix A: Proof of Lemma 2 ! ) T k´1 ` Dividing the event XTk´1 “ x 1 with the value of T k´1 , `1 we have ˇ ! ) T k´1 ` k´1 ˇ k´1 Pr XT k´1 “ x “ w ˇW 1 `1 1 1 8 ÿ

t` ( Pr Xt`1 “ x1 , T k´1 “ t, W1k´1 “ wk´1 1

t“1

“

(A¨ 1) Each term of the summation of the above formula can be rewritten as follows. Pr

“ x1 , T k´1 “ t, W1k´1 “ wk´1 1 ˇ t` ( Pr Xt`1 “ x1 ˇT k´1 “ t, W1k´1 “ wk´1 1

“

t` Xt`1

From the above two inclusion relations, we have the equivalence t ( ( X1 “ x1t “ T k´1 “ t, W1k´1 “ wk´1 1 satisfying }w1 } ` ¨ ¨ ¨ ` }wk´1 } “ t and for any t and wk´1 1 corresponding x1t . Since X “ X1 X2 ¨ ¨ ¨ is stationary memoryless, the equivalence of the two events implies that ˇ t` ( Pr Xt`1 “ x1 ˇT k´1 “ t, W1k´1 “ wk´1 1 ˇ t` ( “ Pr Xt`1 “ x1 ˇX1t “ x1t t` ( ( “ Pr Xt`1 “ x1 “ Pr X1 “ x1 . (A¨ 3) From (A¨ 1), (A¨ 2) and (A¨ 3), we have ˇ ! ) T k´1 ` k´1 ˇ k´1 Pr XT k´1 “ x “ w ˇW 1 `1 1 1 8 (ÿ ( “ Pr X1 “ x1 Pr T k´1 “ t|W1k´1 “ wk´1 . 1 t“1

“ Pr

X1

“

x1

( .

Appendix B: Proof of Theorem 1

.

( Pr W1k´1 “ wk´1 1

wi , i ă k is parsed sequentially using the tree Ti , then whether xti´1 `1 ¨ ¨ ¨ xti corresponds to a leaf of the tree Ti or not does not depend on the following symbols xn , n ą ti . This implies that the length }wi } of the block wi is determined by xti´1 `1 ¨ ¨ ¨ xti , which means that the event tT k´1 “ tu is a function of the value of X1t . Moreover, since the Cartesian-concatenated tree T1¨¨¨k´1 is proper and complete, if the above x1t is given, there exists unique corresponding leaf w1¨¨¨k´1 P T 1¨¨¨k´1 . As a result, the pair of the event tT k´1 “ tu and the value of W1k´1 is a function of x1t . Then it holds that t ( ( X1 “ x1t Ď T k´1 “ t, W1k´1 “ wk´1 . 1

(

( ¨ Pr T k´1 “ t, W1k´1 “ wk´1 . (A¨ 2) 1 If }w1 } ` ¨ ¨ ¨ ` }wk´1 } “ t, there exists unique x1t satt isfying w1¨¨¨k´1 “ x1t for each wk´1 1 , which means that x1 is k´1 a function of w1 and t. Then it holds that ( ( t X1 “ x1t Ě T k´1 “ t, W1k´1 “ wk´1 1 ( between the events T k´1 “ t, W1k´1 “ wk´1 and X1t “ 1 ( x1t . On the other hand, since we assume that the probability of the information source is given, if each block

To prove Theorem 1, the following lemmas are used. Lemma 5 (Jelinek, Schneider [2]): Any Tunstall tree Tk satisfies P˚ pTk q ě P˚ pTk qP. Lemma 6 (Arimura [14]): The minimum and the maximum length of the blocks parsed with the Tunstall tree Tk is bounded as follows. log |T k | ´ log P log |T k | . ´ 1 ď wpTk q ď wpTk q ď ´ log P ´ log P In addition to these two lemmas, the maximum and the minimum probability has a relationship with the leaf count |T k | of the Tunstall tree Tk as P˚ pTk q ď

1 |T k |

ď P˚ pTk q.

(A¨ 4)

For multi-shot Tunstall code, the following property holds. Define the maximum and the minimum depth of


2404

T1¨¨¨K as

K ÿ

wpT1¨¨¨K q “ max }w1¨¨¨K },

wpT1¨¨¨K q “ min }w1¨¨¨K }. K w1

w1K

Note that the maximum and the minimum with respect to w1K are taken for all the combinations of the blocks wk “ wk pνq, ν P T k for k “ 1, 2, . . . , K parsed by the sequence of fixed Tunstall trees T1 , . . . , TK . Then the following lemma holds. Lemma 7: The minimum and the maximum depths of the tree T1¨¨¨K and the minimum and the maximum depths of the trees T1 , . . . , TK have the following relationships. wpT1¨¨¨K q “

K ÿ

wpTk q,

wpT1¨¨¨K q “

k“1

K ÿ

wpTk q.

k“1

The proof of this lemma is omitted since it is a direct corollary of Lemma 1. Proof of Theorem 1: From Lemmas 6 and 7, the following inequalities hold. }w1¨¨¨K } ě

K ÿ

wpTk q ě

k“1

}w1¨¨¨K } ď

K ÿ k“1

wpTk q ď

K ÿ log |T k | ` log P , ´ log P k“1 K ÿ log |T k | ´ log P

´ log P

k“1

.

From Lemma 5 and (A¨ 4), an upper bound of the pointwise redundancy is given as follows. rpw1K , T1K , Xq " * K K ÿ ÿ 1 1 K` ď log |T k | ´ log ˚ }w1¨¨¨K } P pwk q k“1 k“1 " * K ÿ 1 ˚ ď K` plog |T k | ` log P pTk qq }w1¨¨¨K } k“1 K`

K ÿ

plog |T k | ` log

k“1

ď

1 |T k |P

q

K ÿ log |T k | ` log P ´ log P k“1 ˙ ˆ K ÿ 1 1 log log K` P P ´ p1 ´ log Pq log P k“1 ď K “ . ÿ log gpT1K q ` log P log |T k | ` K log P k“1

Similarly, a lower bound is given as follows. rpw1K , T1K , Xq "ÿ * K K ÿ 1 1 ě log |T k | ´ log ˚ }w1¨¨¨K } k“1 P pwk q k“1 "ÿ * K 1 ˚ ě plog |T k | ` log P pTk qq }w1¨¨¨K } k“1

ě

plog |T k | ` log

k“1

P |T k |

q

K ÿ log |T k | ´ log P

´ log P ˙ 1 1 log log ´ P P k“1 k“1

ˆÿ K

ě

K ÿ

“

log |T k | ´ K log P

´ plog Pq log P log gpT1K q ´ log P

.

k“1

Q.E.D.

Appendix C:

Proof of Lemma 3

In the proof of Lemma 3, the following lemma is used. Lemma 8 (Bernstein): For any stationary memoryless source X with alphabet of size A ă 8 and sufficiently small δ ą 0, it holds that ˇ * "ˇ ˇ1 ˇ 1 ˇ ˇ ´ HpXqˇ ą δ Pr ˇ log n PX n pX1n q " ˆ ˙* A logpn ` 1q ď exp ń f ´1 pδq ´ , n where f ´1 pδq is the inverse function of f pδq defined by (6). Note that f ´1 pδq Ñ 0 as δ Ñ 0. Lemma 8 is shown from the following lemma. Lemma 9: For any stationary memoryless source X with alphabet of size A ă 8 and any δ1 ą 0, it holds that "ˇ ˇ

1 Pr ˇˇ log n

ˇ * ˇ 1 ˇ ą f pδ1 q ´ HpXq ˇ PX n pX1n q ˙* " ˆ A logpn ` 1q 1 , ď exp ń δ ´ n

where f pδ1 q is defined by (6) and f pδ1 q Ñ 0 as δ1 Ñ 0. Since f pδ1 q of Lemma 9 is continuous, strictly monotone increasing for 0 ă δ ă plog eq{2 and f pδ1 q Ñ 0 as δ1 Ñ 0, we can replace δ1 and f pδ1 q with f ´1 pδq and δ, respectively, for small δ. Then we have Lemma 8. The proof of Lemma 9 is almost equivalent to that of Lemma 4 of [14], where Lemma 9 of [14] is replaced by the following lemma. Lemma 10 (Zhang [18], See Problem 3.10 of [24]): For any probability distribution P and Q, it holds that |HpPq´HpQq| ď

ˆ ˙ 1 1 dpP, Qq logpA´1q`h dpP, Qq , 2 2

where hpxq is binary entropy defined by (7)řand dpP, Qq is variational distance defined by dpP, Qq “ xPX |Ppxq ´


2405

Qpxq|.

Applying them to (A¨ 5), we have (5).

Proof of Lemma 3: inequalities.

Q.E.D.

Lemma 7 implies the following Appendix D:

K ÿ log |T k | ` log P wpT1¨¨¨K q “ wpTk q ě ´ log P k“1 k“1 * " 1 1 “ K log gpT1K q ´ log , P ´ log P K K ÿ ÿ log |T k | ´ log P wpT1¨¨¨K q “ wpTk q ď ´ log P k“1 k“1 * " 1 1 K . “ K log gpT1 q ` log P ´ log P

Proof of Theorem 2

K ÿ

Define ApT1K q and BpT1K q by

|ApT1K q| |BpT1K q|

“wpT1¨¨¨K q

ď

δ2 pδ, q ď

“wpT1¨¨¨K q

wpT 1¨¨¨K q ÿ

ď pwpT1¨¨¨K q ` 1qA`1 expt´ f ´1 pδqwpT1¨¨¨K qu. (A¨ 5) It holds that pwpT1¨¨¨K q ` 1qA`1 *A`1 ˙ " ˆ 1 1 ď K log gpT1K q ` log P ´ log P

“

˙* 1 K log gpT1 q ´ 1 ´ log P

˙´ f ´1 pδqK

“ OpgpT1K q´ f

´1

pδqK

(A¨ 6)

ą 1.

(A¨ 7)

Moreover, ´ log P ď ´ log P and log P ă 0 implies that 1 ´ log P

ě

1 ´ log P 1 “1` ą 1. ´ log P ´ log P

(A¨ 8)

q.

1

log

P˚ pw

ě

1

1¨¨¨K q K ÿ

rlog |T k |s ´ ApT1K q, }w1¨¨¨K } k“1

1 1 log ˚ }w1¨¨¨K } P pw1¨¨¨K q K ÿ 1 ď rlog |T k |s ´ BpT1K q }w1¨¨¨K } k“1 ď

“ Optlog gpT1K qK uA`1 q,

1

log gpT1K q ` log P

}w1¨¨¨K }

δ2 pδ, wpT1¨¨¨K qq

“ wpT1¨¨¨K qpwpT1¨¨¨K q ` 1qA expt´ f ´1 pδqwpT1¨¨¨K qu

gpT1K q ´ log P 2

log gpT1K q ´ log P 1 ´ log P . ¨ log gpT1K q ` log P ´ log P

log gpT1K q ´ log P

1

ď wpT1¨¨¨K qδ2 pδ, wpT1¨¨¨K qq

ˆ

“

|ApT1K q| ą 1. |BpT1K q| If we regard Theorem 1 as bounds of the self information, the following inequalities hold.

“wpT1¨¨¨K q

expt´ f ´1 pδqwpT1¨¨¨K qu ˆ " ´1 ď exp ´ f pδqK

ă 0.

Applying (A¨ 7) and (A¨ 8) to (A¨ 6), we have

ˇ * "ˇ ˇ1 ˇ 1 ˇ ˇ Pr ˇ log ´ HpXqˇ ą δ PX pX q q

wpT 1¨¨¨K q ÿ

log gpT1K q ´ log P

From P ă 1 it holds that log P ă 0, and since |T k | ě 2 for all k, gpT1K q ě 2 holds. Using them, it holds that

´ log P

ˇ ˇ * ˇ1 ˇ 1 ˇ log ˇąδ ´ HpXq ˇ ˇ PX pX q

“wpT1¨¨¨K

´ plog Pq log P

ą 0,

First |ApT1K q| ą |BpT1K q| is proved by taking the ratio of two quantities as

it is monotone decreasing with ą 0 for fixed δ. Using this property, the probability of non-typical sequences is bounded from the above as follows. ˇ * "ˇ ˇ ˇ 1 1 ˇ ˇ log ˚ ´ HpXqˇ ą δ Pr ˇ }W1¨¨¨K } P pW1¨¨¨K q " wpT 1¨¨¨K q ÿ “ Pr }W1¨¨¨K } “ X

ď

log gpT1K q ` log P

BpT1K q “

If we define δ2 pδ, q as ˙* " ˆ A logp ` 1q , δ2 pδ, q “ exp ´ f ´1 pδq ´

wpT 1¨¨¨K q ÿ

´ p1 ´ log Pq log P

ApT1K q “

1

K ÿ

rlog |T k |s ` ApT1K q. }w1¨¨¨K } k“1

Using these bounds, we have ˇ "ˇ * ˇ ˇ 1 1 ˇ ˇ log ˚ ´ HpXqˇ ą δ Pr ˇ }W1¨¨¨K } P pW1¨¨¨K q " * 1 1 log ˚ ´ HpXq ą δ “ Pr }W1¨¨¨K } P pW1¨¨¨K q " * 1 1 log ˚ ´ HpXq ă ´δ ` Pr }W1¨¨¨K } P pW1¨¨¨K q


2406 K ÿ

#

rlog |T k |s

k“1

ě Pr

}W1¨¨¨K } K ÿ

# ` Pr

+ ´

ApT1K q

´ HpXq ą δ

rlog |T k |s

k“1

}W1¨¨¨K }

+ ` ApT1K q ´ HpXq ă ´δ

K ÿ ˇ + #ˇ rlog |T k |s ˇ ˇ ˇ ˇ k“1 K “ Pr ˇ ´ HpXqˇ ą ApT1 q ` δ . ˇ ˇ }W1¨¨¨K }

Using Lemma 3, the required upper bound is obtained.

Mitsuharu Arimura received B.E., M.E. and Ph.D. degrees from University of Tokyo, in 1994, 1996 and 1999, respectively. From 1999 to 2004, he was a Research Associate in the Graduate School of Information Systems at the University of Electro-Communications, Tokyo, Japan. Since 2004, he has been with Shonan Institute of Technology, where he is currently a Lecturer of Faculty of Engineering. His research interests include Shannon theory and data compression algorithms. Dr. Arimura is a member of the IEEE.

Almost Sure Convergence Coding Theorems of One-Shot and Multi ...

Almost Sure Convergence Coding Theorems of One-Shot and Multi ...

Suggest Documents

ALMOST SURE FUNCTIONAL LIMIT THEOREMS

Almost sure convergence on chaoses

ALMOST SURE CONVERGENCE OF RANDOM GOSSIP ALGORITHMS

Almost Sure Convergence of Sums - Penn State Department of Statistics

Almost sure weak convergence of random probability measures

Almost sure convergence of a randomized algorithm for relative ...

Almost sure uniform convergence of empirical distribution functions

Convergence of Generalized Moments in Almost Sure ... - Google Sites

On the Almost Sure Limit Theorems IAIbragimov, MA ... - Google Sites

From almost sure local regularity to almost sure Hausdorff dimension ...

Almost sure characterization of martingales

Almost Sure Stability and Transient Behavior of

ALMOST SURE AND MOMENT EXPONENTIAL ... - Semantic Scholar

Almost sure diagnosis of almost every good element - IEEE Computer

Convergence Theorems of Estimation of

ALMOST SURE SAMPLING RECONSTRUCTION OF NON-BAND ...

ALMOST SURE ASYMPTOTIC STABILITY OF ... - CiteSeerX

Equivalence of Convergence for Almost all Signs and Almost all

Representation theorems and almost unimodal sequences - CiteSeerX

Representation theorems and almost unimodal sequences - CiteSeerX

Quantum coding theorems

ALMOST SURE ASYMPTOTIC GROWTH BOUNDS FOR SOLUTIONS

Almost sure optimal hedging strategy - arXiv

AN ALMOST SURE ERGODIC THEOREM FOR QUASISTATIC ...