IEICE TRANS. FUNDAMENTALS, VOL.E101–A, NO.1 JANUARY 2018
249
PAPER
A Variable-to-Fixed Length Lossless Source Code Attaining Better Performance than Tunstall Code in Several Criterions˚ Mitsuharu ARIMURA: a) , Senior Member
SUMMARY Tunstall code is known as an optimal variable-to-fixed length (VF) lossless source code under the criterion of average coding rate, which is defined as the codeword length divided by the average phrase length. In this paper we define the average coding rate of a VF code as the expectation of the pointwise coding rate defined by the codeword length divided by the phrase length. We call this type of average coding rate the average pointwise coding rate. In this paper, a new VF code is proposed. An incremental parsing tree construction algorithm like the one that builds Tunstall parsing tree is presented. It is proved that this code is optimal under the criterion of the average pointwise coding rate, and that the average pointwise coding rate of this code converges asymptotically to the entropy of the stationary memoryless source emitting the data to be encoded. Moreover, it is proved that the proposed code attains better worst-case coding rate than Tunstall code. key words: lossless data compression, source coding, variable-to-fixed length code, Tunstall code, average pointwise coding rate, worst-case coding rate
1.
Introduction
In this paper, a new variable-to-fixed length lossless source code (VF code) is proposed. A VF code parses one or more phrases from the head of a given sequence and each parsed phrase is encoded to a fixed length codeword. Compression performance of this code is evaluated by the coding rate, which is defined as the fixed codeword length divided by the variable phrase length. A VF code with better compression performance has a property such that the parsed phrase is longer. Therefore, a VF code is equivalent to a parsing tree such that each leaf node corresponds to a phrase, and the optimization of a VF code is equivalent to the optimization of parsing tree under the restriction that the number of leaf nodes is fixed. As an optimal VF code, there exists Tunstall code [1]. This is an optimal VF code in the sense that the average phrase length is maximum in the set of the VF codes which have variable-length phrase sets of the same size to be encoded. Consequently, this code minimizes the average coding rate, which is defined by the fixed codeword length divided by the average phrase length. Moreover, if a fixed number of multiple phrases are parsed and encoded by a Manuscript received May 20, 2017. Manuscript revised August 25, 2017. : The author is with the Department of Applied Computer Sciences, Shonan Institute of Technology, Fujisawa-shi, 251-8511 Japan. ˚ This paper was partly presented at 2016 International Symposium on Information Theory and its Applications (ISITA2016). a) E-mail:
[email protected] DOI: 10.1587/transfun.E101.A.249
fixed Tunstall code (we call this algorithm multi-shot Tunstall code [8], [9]) repeatedly, the total average coding rate is the same as that of the first phrase. On the other hand, for a one-shot Tunstall code, which means the Tunstall code which parses and encodes only one phrase, we can also consider the expectation of the pointwise coding rate as a performance criterion [6], [9], where pointwise coding rate is defined as a codeword length divided by the length of each phrase. We call it average pointwise coding rate [6]. Average pointwise coding rate of the multi-shot Tunstall code is not equal to that of the first phrase, which is a different property from the case of average coding rate described above. However, if we treat the overall multi-shot Tunstall code with fixed number of phrases as a one-shot VF code, average pointwise coding rate can also be treated as a criterion for average compression performance [7], [9]. Moreover, it has direct relationship with the pointwise coding rate, the almost-sure limit [8] and the convergence in probability of coding rate, as shown in [6]. This is a merit of the definition of the average pointwise coding rate in the context of theoretical performance analysis. The author has evaluated the average pointwise coding rate of the one-shot and the multi-shot Tunstall codes [6], [7], [9]. There is a relationship between the average coding rage and the average pointwise coding rate for any VF code as follows. From Jensen’s inequality (cf. [10]), the average pointwise coding rate is always larger than the average coding rate for a VF code. Moreover, since Tunstall code is designed so that it optimizes the average coding rate, the average pointwise coding rate of Tunstall code is not optimal in the set of all the VF codes with phrase sets of variations of the phrase sets of the same size. In this paper, we propose a new VF coding algorithm which optimizes the average pointwise coding rate in the set of all the VF codes with phrase sets of the same size. We give an incremental parsing tree construction algorithm such as that creating the Tunstall parsing tree, and evaluate the compression performance of the code. First, we show that this code attains optimal average pointwise coding rate for each size of phrase set. Next, it is shown that this code is asymptotically optimal in the sense that the average pointwise coding rate of this code converges to the entropy rate of the source as the size of the phrase set becomes large. Finally, we prove that the worst-case coding rate, which is introduced in this paper, of the proposed code can be smaller than that of Tunstall code. This paper is organized as follows. Section 2 is devoted
Copyright © 2018 The Institute of Electronics, Information and Communication Engineers
IEICE TRANS. FUNDAMENTALS, VOL.E101–A, NO.1 JANUARY 2018
250
to the notation and definitions on information sources and VF codes. Section 3 review the Tunstall algorithm. We propose a new VF algorithm, prove its optimality, and establish the coding theorem in Sect. 4. In Sect. 5, we compare the worstcase coding rate of Tunstall code and the proposed code, and show that the proposed code can attain better worst-case coding rate than Tunstall code. After that, we compare the shape of the parsing trees constructed by the two algorithms. Some extensions of the results are noted in Sect. 6. 2.
Notation and Definitions for VF Codes
In this section, some definitions of an information source, a VF code and the average pointwise coding rate of a VF code are given. 2.1
For example, the following relationships hold a3 a2 ă a1 a1 a1, a1 a2 a3 ă a1 a2 a4 . Let X “ X1 X2 ¨ ¨ ¨ be a stationary memoryless source such that X1 “ X for some X. Each Xi, i ě 1 takes values in X. Since probability distributions PXi are equivalent for all i, we denote them by PX . Without loss of generality, we assume that PX pai q ě PX pai`1 q for any i P t1, 2, . . . , A´1u. The probability of the null string λ is defined as PX pλq “ 1. For any variable length sequence w “ x 1 x 2 ¨ ¨ ¨ x }w } P X ` , the probability of w is denoted by Ppwq. Since the source X is stationary memoryless, it holds that Ppwq “
Information Source
For any random variable X, the probability distribution and the expectation are denoted by PX and ErXs, respectively. Sequences of random variables X1 X2 ¨ ¨ ¨ X n and samples x 1 x 2 ¨ ¨ ¨ x n are written as X1n and x 1n , respectively. A subsequence x i x i`1 ¨ ¨ ¨ x j of x 1n for 1 ď i ď j ď n is denoted by j x i . Null string is represented by λ and an infinite sequence beginning with x i is represented by x 8 i . The cardinality of a finite set A is written as |A|, and }s} means the length of a finite length string s. Cartesian product of a set A with itself in n times is denoted by A n , and the set of all the finite sequences on A is denoted by A ` “ Yně1 A n . For two sets A, B, AzB represents the difference set, which is created by omitting the elements of B from A. Throughout this paper, the base of log is assumed to be 2. The prefix set of a string x “ x 1n “ x 1 x 2 ¨ ¨ ¨ x n is written by def
PrefEqpxq “ tλ, x 1, x 21, . . . , x 1n u, and the prefix set of x excluding x itself is written by def
}w } ź
PX px i q.
i “1
The entropy rate HpXq of the source X is given by the entropy of PX as ÿ HpXq “ HpPX q “ ´ PX pxq log PX pxq. xPX
2.2
Parsing Tree and VF Code
Since a VF code corresponds to a parsing tree, we first define the parsing tree. A tree T is defined as a set of nodes tνu. In this paper, any T is a rooted A-ary tree such that each internal node of T has A child nodes. A tree which obeys such a property is said to be complete. The leaf set T of a tree T is defined as the collection of nodes which have no child node. Each edge has a label x P X. A node ν P T corresponds to a string wpνq, which is defined by the concatenation of the labels of the edges in the path from the root node to ν. Then the depth of the node ν P T is represented by }wpνq}. The minimum and the maximum depths of the leaves of a tree T are written by
PrefNeqpxq “ PrefEqpxqztxu “ tλ, x 1, x 21, . . . , x 1n´1 u.
wpT q “ min }wpνq},
(1)
Therefore, for two strings x and y, y P PrefEqpxq means that y is a prefix of x, and y P PrefNeqpxq means that y is a prefix of x and y ‰ x. Subsequently, an alphabet and a stationary memoryless source is defined. Let X “ ta1, a2, . . . , a Au p2 ď A ă 8q be an ordered finite alphabet. Assume that the elements of X satisfies the dictionary order ă as a1 ă a2 ă ¨ ¨ ¨ ă a A. The order of the strings is defined as follows.
wpT q “ max }wpνq},
(2)
Definition 1 (Order of Strings): For any two distinct strings x “ x 1 x 2 ¨ ¨ ¨ x i1 , y “ y1 y2 ¨ ¨ ¨ yi2 , if it holds that i 1 ă i 2 , then the relationship of x and y is represented by x 1 ă y1 . On the other hand, if it holds i 1 “ i 2 , then, since x and y are distinct, there exists 0 ď i 1 ă i 1 “ i 2 such that it hold x i1 ă yi1 (or yi1 ă x i1 ), and x i “ yi for all i ă i 1 . In this case, x ă y (y ă x).
νP T
νP T
respectively. We call them the minimum depth and the maximum depth of the tree T . For any leaf or internal node ν, a probability is assigned by that of the corresponding string as def
Ppνq “ Ppwpνqq.
(3)
Then the following properties are satisfied. Property 1. The probability Pp¨q satisfies the consistency condition such that for an internal node µ and the set of its child nodes tνu it holds that ÿ Ppµq “ Ppνq. (4) ν P the child node set of µ
ARIMURA: A VARIABLE-TO-FIXED LENGTH LOSSLESS SOURCE CODE ATTAINING BETTER PERFORMANCE THAN TUNSTALL CODE IN SEVERAL CRITERIONS
251
Property 2. Since any tree T is assumed to be complete in this paper, it holds for the leaf set T that ÿ Ppνq “ 1. (5) νP T
From Property 2., the random variable for a leaf ν P T of a tree T is written by N T . Subsequently, we define a VF code. Consider a sequence of trees tTm umě1 . The leaf-node set T m of each tree Tm corresponds to the set Xm “ twpνq : ν P T m u of variable length strings. A VF code on the string set Xm is defined as follows. Suppose that an infinitely long sequence 8 x8 is given. Since Tm is complete for any m, there 1 P X exists a prefix w P Xm of x 8 1 . The phrase w P Xm is parsed 8 from x 1 and encoded in rlog |Xm |s “ rlog |T m |s bits losslessly. Then Xm forms the message set of a VF code. We call this procedure a one-shot VF code. This procedure corresponds to the following application case. Suppose that a sufficiently long video or audio stream is compressed and stored to a storage of rlog |T m |s bits or is compressed and rlog |T m |s bits of compressed data is transmitted during a fixed time length in a fixed transmission rate. Then the length }wpνq} of the original stream which can be stored or transmitted depends on the stream itself. In this paper we study the asymptotics of the coding rate prlog |T m |sq{p}wpνq}q when m becomes large. 3.
Tunstall Code
This section is devoted to a parsing tree construction algorithm of Tunstall code [1]. A sequence tTmT uns umě1 of Tunstall parsing trees (or, simply, Tunstall trees) is constructed by Algorithm 1. Algorithm 1. Tunstall Algorithm 1: Let T1 T u ns be the A-ary tree with depth one. 2: for m “ 2 to 8 do 3: {Construct tree TmT u ns from TmT´u1ns as follows (we call this procedure the extension of the Tunstall tree).} 4: Pick up a leaf node ν with maximum probability in the leaf T u ns
5:
set T m´1 (if there exist two or more nodes with maximum probability, pick up the first one in the sorted set of them in the order defined in Definition 1: ). Let TmT u ns be the tree constructed by making A children
nodes of ν P 6: end for
T u ns T m´1 .
T uns
The cardinality of the leaf set T m T uns |T m |
of the tree TmT uns
is given by “ 1 ` pA ´ 1qm, and the number of the internal node is equal to m. Therefore, if we write a tree as Tm , it means not only the extension step of the tree is m but also the tree has m internal nodes. : Note
that in Algorithms 1 and 2, the order can be arbitrarily determined during the analyses of Sect. 4.
A VF code using the Tunstall tree (Tunstall code) is defined as follows. Assume that a number of extension step m ă 8 is given. Construct the Tunstall tree TmT uns . Since the Tunstall tree is complete, if we select any infinite 8 sequence x 8 1 P X , there exists just one leaf node ν corresponding to a prefix x of x 8 1 as wpνq “ x. The phrase x “ wpνq is parsed from x 8 1 and can be encoded and deT uns
coded by the fixed length lossless code in rlog |T m Then the pointwise coding rate is defined as
|s bits.
T uns
rlog |T m |s (bits/symbol). }wpνq} Tunstall code is optimal in the following way. Any Tunstall tree TmT uns maximizes the expectation of the depth of the tree Er}wpN Tm q}s in all the complete trees tTm u such T uns
that the leaf number |T m | of each tree Tm is equal to |T m |. Therefore, if we define the average coding rate of the VF code using the parsing tree Tm as rlog |T m |s Er}wpN Tm q}s
,
(6)
then the Tunstall tree TmT uns minimizes (6) in the set tTm u of all the complete trees such that the leaf number is equal T uns to |T m | [1], [3]. Moreover, the value of (6) converges to the entropy of the source X [2], [3]. On the other hand, it is proved in [6] that the average pointwise coding rate „ rlog |T m |s E (7) }wpN Tm q} of the Tunstall tree Tm “ TmT uns converges to the entropy rate of the source X as m goes to infinity. However, it is not clear whether Tunstall code minimizes (7) in the set tTm u of all the parsing trees such that the leaf number is equal to T uns |T m | for each m. If the answer of this question is no, it is not clear which parsing tree minimizes (7). In the next section, we give an algorithm which constructs a parsing tree which is optimal in the criterion of average pointwise coding rate (7). 4.
The Proposed Algorithm
In this section, we propose a parsing tree construction algorithm of a VF code which is optimal in the sense that it minimizes the average pointwise coding rate (7). It is also proved that the average pointwise coding rate of this code attains the entropy of any stationary memoryless source asymptotically as the parsing tree grows. 4.1
Incremental Parsing Tree Construction Algorithm
The construction algorithm of a sequence tTm umě1 of parsing trees in the proposed method is given in Algorithm 2.
IEICE TRANS. FUNDAMENTALS, VOL.E101–A, NO.1 JANUARY 2018
252 Algorithm 2. Proposed Algorithm 1: Let T1 be the A-ary tree with depth one. 2: for m “ 2 to 8 do 3: {Construct tree Tm from Tm´1 as follows (we call this procedure the extension of tree).} 4: Pick up a leaf node ν ˚ “ ν which maximizes Qpνq :“
Ppνq }wpνq}p}wpνq} ` 1q
(8)
in the set T m´1 (if there exist two or more nodes with maximum value of Qpνq, pick up the first one in the sorted set of them in the order of Definition 1). 5: Let Tm be the tree constructed by making A children nodes of ν ˚ P T m´1 . 6: end for
tree Tm`1 is local optimization over trees which are extended from Tm by one step. This means that the tree may not be globally optimal. The following theorem proves that the parsing tree constructed above is globally optimal. Theorem 1: The VF parsing tree Tm constructed by Algorithm 2 minimizes the average pointwise coding rate „ rlog |T m |s E in the set of all the complete A-ary VF pars}wpN Tm q} ing trees with m internal nodes. The proof of Theorem 1 is given in Appendix B. 4.3
Like Algorithm 1, it is a greedy algorithm which makes a sequence of growing parsing trees by successively extending leaves. But the leaves to be extended are chosen according to a different criterion from Algorithm 1. Note that, in the proof of Theorems 1 and 2, it is sufficient that the order of strings in Step 4 is uniquely predetermined. However, to improve the worst case coding rate in Sect. 5, the node which corresponds to the shortest string should be picked up. Therefore, the order defined in Definition 1 is used. In the following subsections, it is proved that the proposed code is optimal in the sense that it minimizes the average pointwise coding rate in the set of all the parsing trees with leaf sets of the same size. We also prove the coding theorem, which states that the average pointwise coding rate (7) converges to the entropy of the stationary memoryless source as m goes to infinity. 4.2
Coding Theorem
In this section, we present the following theorem stating that the average pointwise coding rate of the proposed code attains the entropy rate asymptotically. Theorem 2: If a phrase is parsed and encoded from any stationary memoryless source using the VF parsing tree Tm constructed in Algorithm 2, then the average pointwise coding rate satisfies „ rlog |T m |s lim E “ HpXq. (9) mÑ8 }wpN Tm q} The proof of Theorem 2 is given in Appendix C. 5.
Comparison of the Two Codes
Comparing Tunstall code and the proposed code using the average coding rate, it holds that T uns
Optimality
In this subsection, we present the optimality of the algorithm defined in the previous subsection. First we show the following lemma. Lemma 1: When a tree Tm`1 is constructed by extending any leaf ν P T m of the tree Tm , it holds that „ „ rlog |T m`1 |s rlog |T m`1 |s rlog |T m |s E “ E }wpN Tm`1 q} }wpN Tm q} rlog |T m |s ´ rlog |T m`1 |sQpνq, where Qpνq is defined in (8). The proof of Lemma 1 is given in Appendix A. In this formula, |T m | and |T m`1 | are independent of the leaf node ν chosen in the extension. Consequently, under the assumption that the tree Tm is optimal in the set of trees with m internal nodes, if we select ν 1 as ν maximizing Qpνq in the set T m , the average pointwise coding rate with respect to the parsing tree Tm`1 is optimized in the set of trees with m ` 1 internal nodes. However, the optimization on the average pointwise coding rate with respect to the parsing
rlog |T m |s rlog |T m |s ď Er}wpN TmT u ns q}s Er}wpN Tm q}s
(10)
for each m, since Tunstall code is optimal in this criterion [2], [3]. On the other hand, using the criterion of average pointwise coding rate, it holds that „ T uns rlog |T m |s rlog |T m |s E ďE }wpN Tm q} }wpN TmT u ns q} „
(11)
for each m. From these relationships we cannot say that which code is better because each code is optimal in each criterion. In the following, we show that the proposed code is better than Tunstall code, if we adopt the worst-case coding rate as a criterion. 5.1
5.1.1
Comparison of the Two Codes Using the Worst-Case Coding Rate Worst-Case Coding Rate
As performance criterions of VF codes, the average coding rate [1], the overflow probability and the large deviation performance [4], and the competitive optimality [5], etc.
ARIMURA: A VARIABLE-TO-FIXED LENGTH LOSSLESS SOURCE CODE ATTAINING BETTER PERFORMANCE THAN TUNSTALL CODE IN SEVERAL CRITERIONS
253
have been investigated. We introduce a new performance criterion, which is the worst-case coding rate. For a VF code with the parsing tree Tm , this quantity is defined as max
νP T m
rlog |T m |s , }wpνq}
which can be represented using the overflow probability as " " * * rlog |T m |s rlog |T m |s max “ inf α : Pr ąα “0 . }wpN Tm q} ν P T m }wpνq} Therefore, the performance analysis using the worst-case coding rate can be regarded as a restricted variation of the analysis of the overflow probability. If the average coding rate is suppressed, messages are encoded in smaller coding rate in an average sense. On the other hand, if the worst-case coding rate is suppressed, all the messages are encoded in smaller coding rate. In this sense, the worst-case coding rate can be used as a performance criterion of source codes. However, the optimization of the worst-case coding rate in the variation of the VF codes with the fixed number of codewords is not meaningful, because the optimal code in this criterion is that with the uniform message length, which can be easily verified. This code does not compress the message at all, and therefore it does not generally attain the entropy rate of the source asymptotically. The VF code proposed in the previous section has properties such that it attains better performance than Tunstall code with the same number of codewords in the criterion of the worst-case coding rate, which will be shown in Theorem 4, and the average pointwise coding rate attains the entropy rate of the source asymptotically, as shown in Theorem 2. 5.1.2
Comparison Using the Worst-Case Coding Rate
For the convenience of the proof, we give another representation of Algorithm 1 in Algorithm 1’. Algorithm 1’. Tunstall Algorithm (alternative representation) 1: x1˚ Ð λ, S1 Ð X ` , T1 T u ns Ð the A-ary tree of depth one. 2: for m “ 2 to 8 do 3: Assume that there are im´1 nodes which take the maximum probability in Sm´1 , and let them be Srm´1 “ tw1, . . . , wi m´1 u such that w1 ă w2 ă ¨ ¨ ¨ ă wi m´1 , where the order ă is defined in Definition 1. ˚ 4: xm Ð w1 , Sm Ð Sm´1 ztw1 u 5: Let TmT u ns be the A-ary tree such that each of ˚ x1˚, x2˚, . . . x m corresponds to an internal node of TmT u ns . 6: end for
Similarly, another representation of Algorithm 2 is given by Algorithm 2’. We first present the following theorem, which states
Algorithm 2’. Proposed Algorithm (alternative representation) 1: x1˚ Ð λ, S1 Ð X ` , T1 Ð the A-ary tree of depth one. 2: for m “ 2 to 8 do 3: Assume that there are im´1 nodes which take the maximum value Qpwq in Sm´1 , and let them be Srm´1 “ tw1, . . . , wi m´1 u such that w1 ă w2 ă ¨ ¨ ¨ ă wi m´1 , where the order ă is defined in Definition 1. Note that the function Qp¨q is defined by (8). ˚ Ð w1 , Sm Ð Sm´1 ztw1 u 4: xm ˚ 5: Let Tm be the A-ary tree such that each of x1˚, x2˚, . . . x m corresponds to an internal node of Tm . 6: end for
that the minimum depth of the parsing tree constructed by the proposed algorithm is larger than or equal to that of the Tunstall parsing tree for each m. Theorem 3: Output of a stationary memoryless source X is parsed and encoded by the trees Tm and TmT uns , respectively. Then it holds that wpTm q ě wpTmT uns q.
(12)
The proof of Theorem 3 is given in Appendix D. Using this theorem, we can prove the following theorem. Theorem 4: Parse and encode the output of a stationary memoryless source X using the tree Tm and the Tunstall tree TmT uns , respectively. Then we have T uns
rlog |T m |s rlog |T m |s max ď max . T u ns }wpνq} ν P T m }wpνq} νP T
(13)
m
The proof is omitted since it can be directly shown from Theorem 3 and Eq. (1). From the letter theorem, we can say that the proposed VF code can attain better performance than Tunstall code, if we adopt the criterion of worst-case coding rate. Note that, if the probability distribution PX is uniform, the same sequence of parsing trees are constructed by both of Tunstall and the proposed algorithms. Therefore, in this case, the equality holds in the inequality (13). 5.2
Comparison of the Shape of the Parsing Trees
In this section, the shape of the parsing trees constructed by the Tunstall and the proposed algorithms are compared in the situations such that i) the parsing trees are small, and ii) the parsing trees are extended so that the minimum depth of the parsing trees are sufficiently large. 5.2.1
When the Parsing Tree is Small
In the case that the parsing trees are small, suppose a situation such that the parsing trees created by the two algorithms have the same minimum depth. Let the leaf set of the parsing tree at step m ´ 1 created by the proposed algorithm be T m´1 “ tν1, ν2, . . . , ν| T m´1 | u.
IEICE TRANS. FUNDAMENTALS, VOL.E101–A, NO.1 JANUARY 2018
254
In this leaf set, let one of the leaf nodes with the minimum depth be ν1 . Then, for the node ν such that }wpνq} “ }wpν1 q} ` d pd ě 0q, the ratio of the value of Qpνq, ν ‰ ν1 to the value of Qpν1 q is Qpνq Ppνq }wpν1 q}p}wpν1 q} ` 1q “ ¨ Qpν1 q Ppν1 q p}wpν1 q} ` dqp}wpν1 q} ` d ` 1q Ppνq 1 ˙ˆ ˙. “ ¨ˆ Ppν1 q d d 1` 1` }wpν1 q} }wpν1 q} ` 1 (14) When the value of }wpν1 q} is fixed, if the value of d becomes large, the value of Qpνq{Qpν1 q becomes arbitrarily smaller than the value of Ppνq{Ppν1 q. In each of the Algorithms 1 and 2, the leaf node ν with maximum Ppνq or Qpνq (which is the same node as that with maximum Ppνq{Ppν1 q or Qpνq{Qpν1 q, respectively) is selected and the tree is extended. Therefore, in Algorithm 2, this situation means that the node with large d tends not to be selected as the leaf node to be extended, compared to the Algorithm 1. This makes that the minimum depth of the tree becomes larger in Algorithm 2 compared to Algorithm 1. 5.2.2
Parsing Tree by the Tunstall Algorithm (ppaq “ 3{4, m “ 4).
Fig. 2
Parsing Tree by the Proposed Algorithm (ppaq “ 3{4, m “ 4).
Fig. 3
Parsing Tree by the Tunstall Algorithm (ppaq “ 3{4, m “ 8).
When the Parsing Tree is Large
As the parsing trees grow, the parsing tree constructed by the proposed algorithm tends to resemble the parsing tree constructed by the Tunstall algorithm in the following reason. In this situation, fix the value of d arbitrarily. Then the value of (14) converges to Ppνq{Ppν1 q as }wpν1 q} becomes large for each d. This means that the values of Qpνq{Qpν1 q is almost the same as the value of Ppνq{Ppν1 q when }wpν1 q} is sufficiently large. Therefore, the effect of the difference of Qpνq from Ppνq is weakened as the parsing trees grow. This makes that the parsing trees of the two algorithms are extended similarly when the parsing trees are sufficiently large. This asymptotic behavior can also be easily guessed from the fact that the average pointwise coding rates of these algorithms converges to the same value, as shown in Theorem 2 and [6, Theorem 5]. 5.3
Fig. 1
Examples
In Figs. 1–4, examples of the parsing trees are presented for the setting of X “ ta, bu, pPpaq, Ppbqq “ p3{4, 1{4q, and m “ 4, 8, for the two algorithms, respectively. Internal nodes are encircled and each number in the circle is the order m that the node is picked up as x ˚m in Algorithms 1’ or 2’. For each node ν, the value of Ppνq or Qpνq is presented. Apparently, the minimum depth of the tree Tm is larger than that of the Tunstall tree TmT uns for m “ 4, and is the same for m “ 8. A leaf with small depth corresponds to the large coding rate. Therefore, it is verified that the worstcase coding rate of the proposed code is better than that of Tunstall code when m is small, from these simple examples.
6.
Extensions
The results obtained above can be extended in the following way. 6.1
Optimality on Finite Memory Sources
The proposed algorithm and the optimality shown in Sect. 4.2
ARIMURA: A VARIABLE-TO-FIXED LENGTH LOSSLESS SOURCE CODE ATTAINING BETTER PERFORMANCE THAN TUNSTALL CODE IN SEVERAL CRITERIONS
255
References
Fig. 4
Parsing Tree by the Proposed Algorithm (ppaq “ 3{4, m “ 8).
can be extended to stationary sources with finite memory. Lemma 1 can be adapted to the sources with memory by replacing P˚ pν ˚ qPpxq with P˚ pν ˚ qPpx|wpν ˚ qq in (A¨ 1), where Ppx|wpν ˚ qq means the probability of a symbol x conditioned by the context wpν ˚ q. Algorithm 2 and Theorem 1 can be extended to stationary sources with finite memory because they use the result of Lemma 1 directly. 6.2
Coding Theorems for Multi-Shot Algorithm
The coding theorem of the proposed algorithm shown in Sect. 4.3 is proved in the case of the one-shot VF code. This theorem can be extended to the multi-shot VF code, which parses and encodes multiple phrases repeatedly, by using the result of [7], [9] instead of [6]. 7.
Concluding Remarks
We proposed a new variable-to-fixed length lossless source code. An incremental parsing tree construction algorithm such as Tunstall code is given. We proved that this code optimizes the average pointwise coding rate, which is different criterion from Tunstall code. It is also proved that the average pointwise coding rate of the proposed code attains the entropy of any stationary memoryless source as the parsing tree grows asymptotically. Moreover, it is shown that, under the criterion of the worst-case coding rate, the proposed code can attain better performance than Tunstall code. Therefore, the proposed code attains better compression performance than Tunstall code in two criterions, which are the average pointwise coding rate and the worst-case coding rate.
[1] B.P. Tunstall, Synthesis of Noiseless Compression Codes, Ph.D. dissertation, Georgia Inst. Tech., Atlanta, GA, 1967. [2] F. Jelinek and K. Schneider, “On variable-length-to-block coding,” IEEE Trans. Inf. Theory, vol.18, no.6, pp.765–774, Nov. 1972. [3] J. Berhoeff, “A new data compression technique,” Annals of Systems Research, vol.6, pp.139–148, 1977. [4] N. Merhav and D.L. Neuhoff, “Variable-to-fixed length codes provide better large deviations performance than fixed-to-variable length codes,” IEEE Trans. Inf. Theory, vol.38, no.1, pp.135–140, Jan. 1992. [5] H. Yamamoto and H. Yokoo, “Average-sense optimality and competitive optimality for almost instantaneous VF codes,” IEEE Trans. Inf. Theory, vol.47, no.6, pp.2174–2184, Sept. 2001. [6] M. Arimura, “On the average coding rate of the Tunstall code for stationary and memoryless sources,” IEICE Trans. Fundamentals, vol.E93-A, no.11, pp.1904–1911, Nov. 2010. [7] M. Arimura, “On the coding rate of a multishot Tunstall code for stationary memoryless sources,” Proc. 2014 International Symposium on Information Theory and its Applications (ISITA2014), pp.284– 288, Melbourne, Australia, Oct. 2014. [8] M. Arimura, “Almost sure convergence coding theorems of one-shot and multi-shot Tunstall codes for stationary memoryless sources,” IEICE Trans. Fundamentals, vol.E98-A, no.12, pp.2393–2406, Dec. 2015. [9] M. Arimura, “Average coding rate of a multi-shot Tunstall code with an arbitrary parsing tree sequence,” IEICE Trans. Fundamentals, vol.E99-A, no.12, pp.2281–2285, Dec. 2016. [10] T.M. Cover and J.A. Thomas, Elements of Information Theory, 2nd ed., John Wiley Sons, Inc., 2006. [11] K. Kobayashi and H. Morita, Lectures on Information Theory (Jouhou Riron Kougi), Baifukan, 2008 (in Japanese).
Appendix A:
Proof of Lemma 1
Since the cardinality of the leaf set is represented by |T m | for any m (m ě 1), average pointwise coding rate is given by „ ÿ Ppνq rlog |T m |s E . “ rlog |T m |s }wpN Tm q} }wpνq} νP T m
Then, if a tree Tm`1 is constructed by extending any leaf ν 1 P T m , and a phrase is parsed and encoded with the tree Tm`1 , the average pointwise coding rate for the phrase wpN Tm`1 q is evaluated as follows. „
rlog |T m`1 |s E }wpN Tm`1 q}
ÿ “ rlog |T m`1 |s ν P T m`1
Ppνq }wpνq}
ÿ
Ppνq }wpνq}
Acknowledgments
“ rlog |T m`1 |s
The author expresses his thanks to the associated editor and the two referees who carefully read the manuscript and give valuable comments. This research is supported in part by MEXT Grant-inAid for Scientific Research(C) 15K06088.
ÿ Ppν 1 qPX pxq (A¨ 1) }wpν 1 q} ` 1 xPX ÿ Ppνq Ppν 1 q “ rlog |T m`1 |s ´ rlog |T m`1 |s }wpνq} }wpν 1 q}
ν P T m :ν ‰ν 1
` rlog |T m`1 |s
νP T m
IEICE TRANS. FUNDAMENTALS, VOL.E101–A, NO.1 JANUARY 2018
256
ÿ Ppν 1 q PX pxq 1 }wpν q} ` 1 x P X ÿ Ppνq “ rlog |T m`1 |s ´ rlog |T m`1 |sQpν 1 q }wpνq} ` rlog |T m`1 |s
νP T m
rlog |T m |s E ´ rlog |T m`1 |sQpν 1 q. “ }wpN Tm q} rlog |T m |s rlog |T m`1 |s
Appendix B:
„
Proof of Theorem 1
The proof is given in the same way as that of [11, Theorem 6.1]. In the set of internal nodes of a tree T , the node such that all the child nodes of which are leaf nodes is called sprout, generally. Since there are one or more sprouts in finite tree T , let µ be one of them. Let T µ be the tree which is constructed from the tree T by removing the children of the node µ so that µ becomes a leaf node. This means that the tree T is given by extending the leaf node µ of the tree T µ. In the following, we prove by mathematical induction that for any m the parsing tree Tm is optimal, which means that it minimizes the average pointwise coding rate (7) in the set of all the complete A-ary trees with m internal nodes. In the case of m “ 1, since there is only one parsing tree which is the A-ary tree of depth one, the parsing tree is optimal. Assume that a tree Tk ˚ is optimal over trees with k internal nodes. Let ν ˚ “ ν be a leaf which maximizes Qpνq ˚ ˚ defined in (8) in the leaf set T k of the tree Tk ˚ , and let Tk ` 1 ˚ be the tree which is constructed from Tk by extending the ˚ leaf ν ˚ . We prove that the tree Tk ` is optimal over trees 1 with k ` 1 internal nodes using Lemma 1. Assume that there exists a tree Tk `1 such that the average pointwise coding rate ˚ of Tk `1 is strictly smaller than that of Tk ` . This means that 1 ˚ „ rlog |T k `1 |s rlog |T k `1 |s E ąE . }wpN T ˚ q} }wpN Tk `1 q}
„
(A¨ 2)
k `1
From Lemma 1, we have ˚ ˚ „ ˚ rlog |T k `1 |s rlog |T k `1 |s rlog |T k |s “ E E ˚ }wpN T ˚ q} }wpN T ˚ q} rlog |T k |s k `1 k
„
˚
´ rlog |T k `1 |s
Ppν ˚ q . }wpν ˚ q}p}wpν ˚ q} ` 1q
(A¨ 3)
Now, there exists a sprout of Tk `1 which is not an internal node of Tk ˚ . This is because that, if we assume that all the sprouts of Tk `1 are internal nodes of Tk ˚ , then all the nodes of Tk `1 are included in the node set of Tk ˚ , which ˚
contradicts |T k `1 | ą |T k | concerning the cardinality of the leaf set. Let µ be a sprout of Tk `1 which is not an internal node of Tk ˚ . Then, while tracing the path from µ to the root node of the tree Tk `1 , there exists a leaf ν of Tk ˚ (it might
be ν “ µ). µ Construct a tree Tk `1 from Tk `1 by cutting the subtree with the root node µ such that µ becomes a leaf node. Then µ Tk `1 is the tree constructed by extending the tree Tk `1 at the µ leaf µ by one step. Therefore, it holds that |Tk `1 | “ |Tk ˚ | and the following inequality. „ E
rlog |T k `1 |s }wpN Tk `1 q}
µ rlog |T k `1 |s ´ rlog |T k `1 |sQpµq µ }wpN T µ q} rlog |T k `1 |s k `1 µ „ p aq rlog |T k `1 |s rlog |T k `1 |s ě E ´ rlog |T k `1 |sQpνq µ }wpN T µ q} rlog |T k `1 |s k `1 µ „ pb q rlog |T k `1 |s rlog |T k `1 |s ě ´ rlog |T k `1 |sQpν ˚ q, E µ }wpN T µ q} rlog |T k `1 |s k `1 (A¨ 4)
“
rlog |T k `1 |s
„
E
where (a) holds since Ppµq ď Ppνq and }wpµq} ě }wpνq}, which are satisfied from the relation of µ and ν, and (b) ˚ comes from that both of ν and ν ˚ are in the leaf set T k , and ν ˚ is the ν maximizing Qpνq defined in (8) in the set of all ˚ ν P T k. From (A¨ 2), (A¨ 3) and (A¨ 4), we have µ „ ˚ rlog |T k `1 |s rlog |T k |s E , ąE }wpN T ˚ q} }wpN T µ q}
„
k `1
k
which contradicts that the tree Tk ˚ minimizes the average pointwise coding rate. Since this contradiction comes from ˚ the assumption (A¨ 2), it is concluded that Tk ` is optimal. 1 Appendix C:
Proof of Theorem 2
The proof uses the results on the evaluation of the average pointwise coding rate of Tunstall code [6]. Let Tm be a VF parsing tree with m internal nodes which is constructed by Algorithm 2, and let TmT uns be the Tunstall tree with m internal nodes, which is constructed by Algorithm 1. Let N TmT u ns and N Tm be the random variables T uns
of the leaves ν P T m
and ν 1 P T m , respectively. From T uns
the facts that |T m | “ |T m | and that the VF code using the parsing tree Tm minimizes the average pointwise coding rate in the set of trees with m internal nodes, it is satisfied for each m that „ E
„ T uns rlog |T m |s rlog |T m |s ďE . }wpN Tm q} }wpN TmT u ns q}
Since it is satisfied [6, Theorem 5] that T uns rlog |T m |s lim E “ HpXq, mÑ8 }wpN TmT u ns q}
„
ARIMURA: A VARIABLE-TO-FIXED LENGTH LOSSLESS SOURCE CODE ATTAINING BETTER PERFORMANCE THAN TUNSTALL CODE IN SEVERAL CRITERIONS
257
it holds immediately that „ rlog |T m |s lim sup E ď HpXq. }wpN Tm q} mÑ8
(A¨ 5)
On the other hand, since Tunstall tree maximizes the average phrase length [1], [3], Er}wpN TmT u ns q}s ě Er}wpN Tm q}s
holds at each m. Using Jensen’s inequality (cf. [10]), we have „ T uns rlog |T m |s rlog |T m |s rlog |T m |s . E ě ě }wpN Tm q} Er}wpN Tm q}s Er}wpN TmT u ns q}s
T uns
picted in Fig. A¨ 1. The leaf set T m of the tree TmT uns is represented by the thick solid line and the leaf set T m of the tree Tm is represented by the dotted line. Let ν11 (and µ1 ) represent the leaf node which is placed (A¨ 6)
From (A¨ 5) and (A¨ 6), we have (9).
In both of Algorithm 1’ and Algorithm 2’, a nA
Lemma 2: “ a A ¨ ¨ ¨ a A lastly becomes the internal node of the tree in the looomooon n
}wpν11 q}
Lemma 3: For any tree TmT uns , the probability Ppν1 q of an arbitrary internal node ν1 , and the probability Ppwq of any string w such that wpν2 q P PrefEqpwq for some leaf
}wpµ1 q} ă }wpν11 q}.
(A¨ 12)
Letting ν1 denote the parent of ν11 , we have (A¨ 13)
From Lemma 2, it holds that wpµ1 q “ looomooon a A ¨ ¨ ¨ a A, w p Tm q
(A¨ 7)
Lemma 4: For any tree Tm , the probability Ppν1 q of an arbitrary internal node ν1 , and the probability Ppwq of any string w such that wpν2 q P PrefEqpwq for some leaf ν2 P T m have a relationship Ppν1 q Ppwq ě . }wpν1 q}p}wpν1 q} ` 1q }w}p}w} ` 1q
(A¨ 11)
From (A¨ 9), (A¨ 10) and (A¨ 11), it is satisfied that
have a relationship
Ppν1 q ě Ppwq.
“
(A¨ 10)
wpTmT uns q.
}wpν1 q} ` 1 “ }wpν11 q}.
set X n for any n ě 1.
ν2 P
T uns
last in the order defined in Definition 1 in the set tν P T m : }wpνq} “ wpTmT uns qu (and tν P T m : }wpνq} “ wpTm qu), respectively. Then it holds that }wpµ1 q} “ wpTm q,
Proof of Theorem 3
From Algorithms 1’ and 2’, the following lemmas are satisfied. Note that the proofs are omitted since they can be shown directly from the algorithms.
T uns Tm
(A¨ 9)
The trees TmT uns and Tm with such assumption are de-
T uns
rlog |T m |s lim inf ě HpXq, mÑ8 Er}wpN T T u ns q}s m
Appendix D:
Assumed Tree Structure.
wpTm q ă wpTmT uns q.
Since it is proved in [2, Theorem 1] that
the following inequality holds; „ rlog |T m |s lim inf E ě HpXq. mÑ8 }wpN Tm q}
Fig. A¨ 1
(A¨ 8)
For m “ 1 and 2, it holds that wpTm q “ wpTmT uns q “ 1 for A ě 2, which concludes (12). In the following, under the condition of m ě 3, we show (12) by inducing the contradiction from the assumption that
wpν11 q
“ looomooon aA ¨ ¨ ¨ aA . w p TmT u ns q
Applying (A¨ 12), wpµ1 q P PrefNeqpwpν11 qq is satisfied. T uns
Then, from the facts that |T m | “ |T m | and that wpµ1 q P PrefNeqpwpν11 qq, there exist µ12 P T m and ν2 P T uns
T m such that wpν2 q P PrefNeqpwpµ12 qq. This is because that, if such nodes µ12 , ν2 do not exist, from }wpµq} ď }wpνq} T uns
for all µ P T m , ν P T m
, and the existence of µ1 P T m
T uns and ν11 P T m satisfying (A¨ 12), T uns |T m |, which is a contradiction. T uns For ν2 P T m , (A¨ 11) implies
}wpν11 q} ď }wpν2 q}.
it holds that |T m | ă that (A¨ 14)
IEICE TRANS. FUNDAMENTALS, VOL.E101–A, NO.1 JANUARY 2018
258
If the parent of µ12 is denoted by µ2 , it holds that wpν2 q P PrefEqpwpµ2 qq, which induces }wpν2 q} ď }wpµ2 q}.
(A¨ 15)
Applying (A¨ 13) and (A¨ 14) to (A¨ 15), we have }wpν1 q} ă }wpµ2 q}.
(A¨ 16)
Next, focus on the tree TmT uns . The node ν1 is an internal node of the tree. On the other hand, µ2 satisfies that there exists a leaf ν2 such that wpν2 q P PrefEqpwpµ2 qq. From Lemma 3, it holds that (A¨ 17)
Ppν1 q ě Ppµ2 q. From (A¨ 16) and (A¨ 17), we have
Ppν1 q Ppµ2 q ą . }wpν1 q}p}wpν1 q} ` 1q }wpµ2 q}p}wpµ2 q} ` 1q (A¨ 18) On the other hand, on the tree Tm , µ2 is an internal node and ν1 has a leaf µ1 such that wpµ1 q P PrefEqpwpν1 qq. From Lemma 4, it holds that Ppν1 q Ppµ2 q ď . }wpν1 q}p}wpν1 q} ` 1q }wpµ2 q}p}wpµ2 q} ` 1q (A¨ 19) The inequalities (A¨ 18) and (A¨ 19) contradict, which comes from the assumption (A¨ 9). Therefore, we have (12).
Mitsuharu Arimura received B.E., M.E. and Ph.D. degrees from University of Tokyo, in 1994, 1996 and 1999, respectively. From 1999 to 2004, he was a Research Associate in the Graduate School of Information Systems at the University of Electro-Communications, Tokyo, Japan. Since 2004, he has been with Shonan Institute of Technology, where he is currently a Lecturer of Faculty of Engineering. His research interests include Shannon theory and data compression algorithms. Dr. Arimura is a member of the IEEE.