Complete Variable-Length \Fix-Free" Codes - CiteSeerX

3 downloads 1331 Views 170KB Size Report
Nov 22, 1994 - of codewords from a x-free code can be uniquely parsed from either end. (And x-free codes ... Email address: [email protected].
Complete Variable-Length \Fix-Free" Codes David Gillman Institute for Mathematics & Its Applications University of Minnesota Minneapolis, MN 55455 and Ronald L. Rivesty Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 November 22, 1994 Abstract

A set of codewords is x-free if it is both pre x-free and sux-free: no codeword in the set is a pre x or a sux of any other. A set of codewords fx1; x2; : : : ; xn g over a t-letter alphabet  is said to be complete if it satis es the Kraft inequality with equality, so that ?jx j = 1 : t

X

i

in

1

The set k of all codewords of length k is obviously both x-free and complete. We show, surprisingly, that there are other examples of complete x-free codes, ones whose codewords have a variety of lengths. We discuss such variable-length (complete) x-free codes and techniques for constructing them.

1 Introduction While investigating the properties of Hu man codes [1], we became interested in codes that were both pre x-free and sux-free (or \ x-free"). Fix-free codes have a number of interesting properties. For example, a word constructed by concatenating together a number of codewords from a x-free code can be uniquely parsed from either end. (And x-free codes 

Supported by NSF grant 9212184-CCR and Darpa contract N00014-92-J-1799. Email address:

[email protected]

Supported by NSF grants CCR-8914428 and CCR-9310888, and the Siemens Corporation. Email address: [email protected] y

0 0 0

1

1

1

1

0

1

0

1

0

1

0

0

1 0

0

1

0

1

1

0

(a) 1 1

0 0 1

1 0 1

0

(b) Figure 1: (a) Pre x tree and (b) sux tree for x-free code A = f01, 000, 0010; 0011; 1010; 1011g.

100, 110, 111,

have terrible synchronization properties: being sux-free means that if the channel drops a bit the receiver can not easily resynchronize. ) However, it was not clear to us whether x-free codes could be as ecient as ordinary pre x-free codes. We thus became interested in complete \ x-free" codes, as de ned in the abstract above. Preliminary investigation led us to conjecture that a complete x-free code over a given t-letter alphabet  must be of the form k for some integer k. Conjecture. A complete x-free code over a given t-letter alphabet  must be of the form k for some integer k. That is, there are no variable-length complete x-free codes. We attempted to prove this conjecture, but were surprised to nd out that it is false. Here is the rst counterexample we found, over a binary alphabet: A = f01; 000; 100; 110; 111; 0010; 0011; 1010; 1011g The \pre x tree" and \sux tree" for the code A are given in Figure 1. Any such counterexample to the conjecture generates a family of counterexamples by forming products: let Ak be the set of concatenations of k words from A. Since A is x-free, Ak must also be x-free. Given that the conjecture is false, it is natural to ask for general techniques for constructing x-free codes, much as one can construct Hu man codes. However, the problem of constructing x-free codes seems much more dicult, and we do not know how to generate all such codes. For use in applications, it would seem advantageous to have codes with arbitrarily large ratios between the lengths of the longest and shortest codewords. In 1

The receiver rst parses the transmission into codewords and then decodes each one. Because of the sux-free property, the receiver will parse incorrectly from the dropped bit onward, and the end of a parsed word will never coincide with the end of a source word. This is what we mean by failure to resynchronize. 1

2

the set A just described, the ratio of the lengths of the longest and shortest words is 2. In the next section we show that A is an example of a general construction that gives a ratio of longest word to shortest word approaching 3. We further generalize this by developing a recursive construction which gives arbitrarily large ratios. We leave as an open problem the question of constructing \ecient" x-free codes, given a probability distribution on a source alphabet.

2 General Construction Techniques 2.1

First Construction Method

Here is a general construction of a complete x-free code A over a t-letter alphabet .

Theorem 1 Let S be any subset of k , and p be a xed positive integer less than k, such

that for any two (not necessarily distinct) words of S (say, x = x[1]x[2] : : :x[k] and y = y[1]y[2] : : :y[k]), we have that

x[i] 6= y[i + p] for some i; 1  i  k ? p :

(1)

Then the set of codewords

Fk;p(S ) = S [ pS p [ (k p ? S p ? pS ) +

(2)

is both x-free and complete.

Proof: Condition (1) ensures that no element of S can be a pre x or a sux of any string in pS p. It is immediate that no element of S is a pre x or a sux of any string in (k p ? S p ? pS ), and that no element of (k p ? S p ? p S ) is a pre x or a sux of +

+

 p S p .

any string in Thus Fk;p(S ) is x-free. To show that Fk;p(S ) is complete, we verify that the Kraft inequality is tight. Condition (1) ensures that S p and pS are disjoint. Let s = jS j. Then Fk;p(S ) contains s words of length k, st p words of length k +2p, and tk p ? stp words of length k + p. We thus have 2

X

x2F

+

t?jxj = st?k + st pt?k? p + (tk p ? stp )t?k?p 2

S

k;p ( )

+1

2

+

= st?k + st?k + (1 ? st?k ) = 1; +1

+1

(3) (4) (5)

so that Fk;p is complete. The ratio of the longest word to the shortest word in this x-free code is (2p + k)=k; if we choose p = k ? 1 (which we certainly can do for some S ), the ratio is 3 ? 2=k, which approaches 3 as k goes to in nity. It is possible to pick a set S satisfying condition (1) for any k and any p < k; for example, let S be a set containing a single element x for which x 6= xp . In our example from the previous section the set A arises by letting S = f01g and p = 1 (so that k = 2). 1

+1

3

2.2

Second (Recursive) Construction Method

We now generalize the above construction. We describe how to construct a complete x-free code recursively and prove that the code is x-free as long as there exists a number m and sets of codewords S ; : : :; Sr such that: 1. For i = 1; 2; : : : ; r, we have that Si is a subset of k , and k < k < : : : < kr < m : 1

i

1

2

2. The set S = S [    [ Sr is x-free, but not complete. 3. If j = 6 i + 1 then Sj m?k \ m?k Si = ;. 4. There is a sequence of strings x ; x ; : : : ; xr such that xi 2 Si for 1  i  r, and xi is a sux of some string in xi m?k +1 for 1  i  r ? 1. Remark. Conditions (1) { (4) are the most general we know of to ensure the validity of the construction that follows. The special case to keep in mind while reading the construction is the one in which Si = fxig for every i. We illustrate this special case in the next section. We assume that S ; : : : ; Sr satisfy conditions (1) { (4), and using them we construct a complete x-free code Tr as follows. Step 0. Let T denote S [ (m ? S ); thus T is the union of S with the set of all codewords of length m that do not begin with a word from S . Step i, for i = 1; 2; : : : ; r. Let Ti denote the set of strings in Ti? , except that each string x of Ti? having a proper sux in some Sj is replaced by xm?k . (Each such word x is replaced by tm?k words of length jxj + m ? kj .) Since S is sux-free by condition (2) no string of Ti? has more than one such sux, so this step is well-de ned. It is easy to argue by induction that each Ti is both pre x-free and complete. (In fact it is helpful to think of Ti as a pre x tree, with certain branches being extended to create Ti .) Therefore, it only remains to argue that Tr is sux-free. Claim 1 Let 0  i  r. If x 2 Ti is a proper sux of some string in Ti then x 2 S [    [ Sr?i. In particular, Tr is x-free. Proof. It will suce to show that each new string created in step i contains no proper sux from among (T [    [ Tr) n (S [    [ Sr?i). We prove this by induction on i. The case i = 0 is obvious. Suppose a string x was created in Step i > 0 by appending some string in m?k . By induction we have j  r ? (i ? 1). By condition (3), if x has a proper sux y 2 S , then y must be an element of Sj? , which is contained in S [    [ Sr?i. We must now show that x has no proper suxes from among (T [    [ Tr ) n S . Suppose for purposes of contradiction that y 2 (T [    [ Tr) n S and y is a proper sux of x. The last m letters of y have an element of Sj as a pre x; therefore y was created at some step i0 > 0 by appending some string in m?k . Snip the last m ? kj letters o x and y to get x0 and y0. Then x0 was created at step i ? 1 and y0 was created at step i0 ? 1 and y0 is a sux of x0. Also, y0 2= S , which contradicts the inductive hypothesis. 1

j

i

1

+1

2

i

1

0

0

1

j

1

j

1

+1

1

0

1

j

1

1

0

0

j

4

Claim 2 The longest word in Tr has length m + rj (m ? kj ). =1

Proof. We show by induction that for 0  i  r ? 1, the string xr?i given by condition (4) is a sux of some string in Ti of length m + ij=1(m ? kr?(j?1) ). The case i = 0 follows from condition (3). Suppose i > 0. By induction xr?(i?1) is a sux of some string in Ti?1 of length m + ij?=11 (m ? kr?(j?1) ). The construction of Ti appends all strings of length m ? kr?(i?1) to this string, and by condition (4) one of the new strings so created has xr?i as a sux. This completes the induction. ?1 (m ? k We have shown that x1 is a sux of some string of Tr?1 of length m +jr=1 r?(j ?1)). Now the construction of Tr appends all strings of length m ? k1. This creates strings in Tr of length m + rj=1 (m ? kj ). By a similar argument and condition (3) every string created after step 0 has length m + sj=s (m ? kj ) for some 1  s  s0  r. Therefore, the strings whose existence we just established are the longest in Tr . The construction in Theorem 1 is a special case of the construction of Tr given here, with r = 1, S = S1, k = k1, and m = k + p. Hypothesis (1) in Theorem 1 implies that condition (3) holds. 0

3 Example In this section we construct, for each whole number n, a binary x-free code in which the longest word has length (5n +13n +2)=2 and the shortest word has length 3n +1. The ratio of longest word to shortest word is therefore greater than 5n=6. It is possible to improve the constants slightly with a little e ort; that is, to attain a larger ratio for the same maximum length string. Let n  1 be an integer. Referring to the notation of the previous section, set m = 6n +1 and r = n. For 1  i  r, put ki = 3n + i and xi = 0 i? 10 n?i? 1. The collection of singleton sets Si = fxig, i = 1; : : : ; r, clearly satis es conditions (1) { (4). The construction of the x-free code proceeds as in the previous section. 2

2

1

3

1

4 Further Work There are x-free codes unaccounted for by the construction of this paper. (We have written a C program to search for such codes; this program and its output are available from the authors.) We do not know how common x-free codes are among the complete pre x-free codes. It would be interesting to nd an upper bound on the average codeword length of an optimal x-free encoding of an arbitrary n-letter source alphabet; for example, some constant multiple of the entropy of the source alphabet.

Acknowledgments

We would like to thank Peter Elias and Mojdeh Mohtashemi for helpful discussions.

5

References [1] David A. Hu man. A method for the construction of minimum-redundancy codes. Proceedings of the IRE, 40(9):1098{1101, 1952.

6