Huffman codes and maximizing properties of Fibonacci numbers

4 downloads 32 Views 251KB Size Report
[1], which focused on minimizing properties of Fibonacci numbers in relation to Huffman codes ..... Then a i = a 3 = 2(Un_ 3 - 1) > 2(u4_ 3 - 1) = 2(u 1 - 1) = 0.
H U F F M A N CODES AND M A X I M I Z I N G PROPERTIES OF FIBONACCI N U M B E R S

A. B. Vinokur

UDC 519.1

A contrast Huffman code (of maximum length) and the corresponding contrast sequence o f positive numbers are considered. A maximizing contrast sequence, the maximum cost of a contrast Huffman code, and their relationship with Fibonacci numbers are derived.

We consider the extremal properties of Huffman codes and establish their relationship with Fibonacci numbers. Unlike [1], which focused on minimizing properties of Fibonacci numbers in relation to Huffman codes (trees), this paper examines their maximizing properties.

MAIN CONCEPTS. STATEMENT OF THE PROBLEM Let P = (Pl ,P2 .... ,Pn) be a sequence of positive numbers such that P~ = P2, Pi "~ P,i+l (i = 2, n -

~

(1) 1),

p~ = 1.

(2) (3)

i=l

A prefix binary code for the sequence P is the set X = {xl,xz,...,xn} where: 1) x i is the code word corresponding to the numberpi (i = 1,...,n); 2) x i is a word in the binary alphabet {0, 1}; 3) no code word x/is the head of any other code word xj (i ;e j). The number of symbols in the code word x i is called the length of the code word and is denoted by lr The sum L = L(X) = ~ i = l n li is called the length of the code X, S = S(P, X) = ~i=l n Pili is the cost (or the average code word length) of the code X for the sequence P. The number of elements of the sequence P (the code X) is called the size of the sequence (the code). The set of all prefix codes of size n will be denoted by K n. The code Smi n = Xmin(e) is called minimal for the sequence P if S(P, Xmin) = minxEKn S(P, X). The method of construction of a minimal prefix code for an arbitrary sequence was proposed by Huffrnan [2] (see also [3, p. 495]). The code H = H(P) constructed by this method is called Huffman code for the sequence P (note that not every minimal code Smin is a Huffman code [3, p. 497]. Thus, S (P, H) = rain S (P, X). XCK n

The set of Ituffman codes of size n is denoted M n (Mn C Kn). For different sequences of a fixed size n, Huffman codes in general have different lengths L. Defin#ion 1. The Huffman code C = C(n) of size n is called a contrast code ifL(C) = maxH6Mn L(H).

Translated from Kibernetika i Sistemnyi Analiz, No. 3, pp. 10-15, May-June, 1992. Original article submitted December 5, 1989. 1060-0396/92/2803-0329512.50

©1993 Plenum Publishing Corporation

329

The code C is called a contrast code because the minimum cost of the code (for a corresponding sequence) is achieved for the maximum length of the code. Definition 2. The sequence Q = Q(n) of size n for which a contrast Huffman code exists is called a contrast sequence. Applying the construction procedure for the Huffman code [3, p. 495], we can easily obtain the parameters of the contrast code C = {c 1..... cn} and the conditions on the contrast sequence Q = {ql ..... qn}: Cl = O lt=I'(c0

,

~-1

n--l,

ct =

( i = 2 ,- - n);

0 ~-~ 1

lt=l(ci)=n--i+l ~

LIC)=

1)+ i=1

1)=

( i = 2 , n); (4)

(n-- 1)(n+2). 2

i=2

S(Q, C ) =

qilt = ( n - - 1)qi-t- E ( n - ~=1

i-t- 1)qi;

~=~

(5)

/e--2

E

qt~q~

( k = 3 , n).

i=1

The condition (5) is necessary and sufficient for the sequence to be a contrast sequence (of course, given conditions (1)-(3)). The set of all contrast sequences of size n will be denoted E n. Definition 3. The contrast sequence Qm~ = Qmax(n) of size n is called maximizing if S(Qma x, C) = maxQcEn S(Q,

c). The cost Tmax(n) = S(Qmax , C) (Qmax E En) is called the maximum cost of the contrast Huffman code of size n. In this paper, we construct the maximizing contrast sequence Qmax and determine the maximum cost Tmax(n) for a contrast Huffman code.

A U X I L I A R Y R E L A T I O N S H I P S AND R E S U L T S Fibonacci numbers are defined as follows [4, p. 9]: u 1 = 1, u 2 1, u i = ui_ 1 + ui_ 2 (i >_ 3). By definition, u o = 0. Fibonacci numbers satisfy relationships that are given in [4, pp. 1 1, 15] or can be easily derived: =

k

ut=u~+~--

1,

(6)

( i + ] 9 3),

(7)

i=I

ut+i~ut

+ us

k

(k - - i + 1) ut = uk+4-- (k + 3).

(81

[ (n, i) = un-i+tun+3 "-- un-~+aun+i.

(9)

i=l

Now let

L E M M A 1. For i _< n, [(n,i)=I

[ --

ut, tit,

if! n = i (rood2), if n = ~ i (rood 2).

Proof By definition of Fibonacci numbers, we have "f (n, i)=Un--i+l (Un+2 "3UUn+l) --Un-~l (Un--iq-2 "~-.Un--i+l) = u._t+~un+2--un_~+21tt.+l = un-t+l (u.+l q- un) - - un+l (u.-t~-,1 -t- u~-O = u~-~+lUn - - u~-iu.+l.

330

Thus

f (n, i) = ttn-~+lu,~ - - un-iu~+l. Now, f ( n , i ) = U n ( U n _ i -Ju U n - i - l ) - - U n - - l ( U n ' d I- U n = l ) (u~-i-1 + u~_~_~) = u~-i-lu~-zi - - u ~ - i - 2 u ~ : - l - = - f ( n - 2, i).

(10)

Un--i--lUni --Un--iUn--1

=

= Un--i--1 ( U n - - I +

Un--2)--Un_ltX

Therefore f(n, i) satisfies the recurrence

f(n, i ) - - - f ( n - - 2 ,

(1t)

i).

Case 1. n = i (mod 2), i.e., n and i have the same parity. Applying (11) (n - i)/2 times to (10), we obtainf(n, i) = f(i,

i) = u i _ i + l U

i -- ui_iui+

1 = U l U i - - U o U i + 1 = U i.

Case 2. n ~ i (mod 2). Applying (11) (n - i - 1)/2 times to (10), we obtainl f (n, i,) = [ (i + 1, i ) = U~+t--i+lUi+~-Ui+~lli-~-[~-i ; "~" U2Ui+I - - L/tUi+2 ~ Igi+l - - Ui@ 2 ~ - - U i. Q.E.D. For a contrast sequence denote k--2

Ah=q~--Eqi

(k==3, n).

(12)

i=i

This and (5) give Ah~0

( k = 3 , n).

(13)

Let Q = (ql .... ,qn) be an arbitrary contrast sequence of size n (using (1), we denote ql -- q2 = k). k--2

T H E O R E M 1. qn : Luk-i + Ah + E Aittk-i-1 (k = 3, n). i=3 k--2

Proof Let zh = Luk_l + A~ + ~_~ A i l t l e _ i _ l .

Using relationship (12) for Ai, we have /~--2

k--2

k--2

i--2

zh -= ~uk-i + qh ÷ E q i u e - - , - 1 - - E q i - Z E U k - - , - - , ~ q , . /=3

/=1

i=3

]~l

Consider separately the last (fifth) term in the expression for zk. Changing the order of summation and using (6), we obtain k--2

i--2

k--4

k--4

u~-'-xEqi:Eqiuk-i-1--

E

q,.

Substituting this expression for the last term in zk, we obtain after some manipulations Zk = ~ttk--I -~- qh + (qk_2Ut -3t- qk_3U2 - - q2uk--3 ---" qtu~--2) - - (q~--2 + qk--Z)" Seeing that u 1 = u 2 = 1, ql = q2 = ~', we obtain zh = ~u~_l + q~ - - ~, (uk-3 + uk-2) = ~uk_a + q~ - - ~u~-I = q~.

Q.E.D. Let us now apply Theorem 1 to compute k. We have

~q~

= 2x +

+ k=3

+ k=3

k=5 i=3

Applying (6) to the second term and changing the order of summation in the last (fourth) term, we obtain

~ k=l

qh = ~un+l + ~

Aiun_t+1.

i:3

331

Using (3), we have

XUn+l @ ~ A i u n - i + l = l i:3

whence

1-- ~ ~iUn--t+l

(14)

Un+l

Let us now determine the cost o f the contrast code C for an arbitrary contrast sequence Q. Using (4) and the result of Theorem 1, we obtain for Q E E n n--1

rt

S ( Q , C ) = ~ . ( n - - 1 ) + ~ , ~ ( n - - k ) u h + ~2 ( n - - k + 1)Ak + k~l n--2

+ ~ i=3

k~3

n--i--1

Ai ~

uj(n--]--i).

]=1

Using relation (8) for the second and fourth terms, we obtain

S (Q C) = X ( u . + 3 - - 3) + ~ A ~ ( u . _ t + 3 - - 1). i=3

Substituting the expression for X from (14) in S(Q, C) we obtain n

S (Q, C) -

u.+a - - 3 ~- E /2n+l

/Xi (uh+l ( u ~ - / + 3 - 1) - - u n - i + l (un+a--3)).

/2n+l

i=3

Thus, for Q E E n we have

S(Q, C ) =

1--~(u.+s - - 3 - -

Un+l \

~ a/A~),

(15)

/=a

Un+t (u,-~+3 - - 1) (i = 3, n). T H E O R E M 2. a/ _> 0 (i = 3,...,n, n _> 3). Proof Using (9), we have ai = Un--i+lUn+3--Un--~+aUn+l q- ttn+1 - - 3U~--i~-I = f (n, i) + u,+l ~ 3u,-/+1. Case 1. n = i (rood 2). By Lemma 1 we have f(n, i) = u i and thus ai i--: i u~ + Un+! - - 3u~-~+1 _--_2. 3 (un-~ - + 2u~_3 5r ui ~ 3 (u,-2 - - u,_~+1) + 2u,_3 + u~ = 2u,L3 + ui > O.

where ai = u,-_~+l (un+3 - - 3 ) -

Case 2. n ~ i (rood 2). In this case i _< n - 1, i.e., n _> i + 1. Then by Lemma 1 we havef(n, i) = - u i and a i =

--Ui + Un+I -- 3Un_i+1. Case 2.1. i = 3 (so that n ___ 4). Then a i = a 3 = 2(Un_ 3 - 1) > 2(u4_ 3 - 1) = 2(u 1 - 1) = 0. Case 2.2. i > 3 (so that n _ 5). After the necessary manipulations, we obtain ai = 3 (u,_~ - - u/_z - - t / n _ / + l ) --t- 2(Un-3 ~/,/i--4) ~ 3 (Un--~- - ut-3 - - a,_~+l) + + 2 (ttn--3 - - tQn--l)--4) = 3

(ttn--2- -

tt/--3 - - t t n - - i + l ) + 2ttn--4 _ ~ 3

(Un--2

- - u~_3 - - u~_~+1) + 2u~ ~ 3 (u~-2 - - u~-3 - - u~_/+~) ~ 0 by (7). Q.E.D.

332

- -

MAIN RESULTS T H E O R E M 3. The maximizing contrast sequence of size n is the sequence Qmax = Qmax(n) = (ql,-.. ,qn), where ql = 1/un+l, qi = ui-1/Un+l (i = 2 ..... n). Proof From (13), (15) and Theorem 2 it follows that the maximum value for S(Q, C) (Q E En) is achieved for Ai =

0 (i = 3,...,n). This and (14) give k = I/Un+l, i.e., ql = X = 1/un+l, q2 = X = 1/un+l = ul/Un+l. Now by Theorem 1, for Ai = 0 (i = 3 .... ,n) and X = 1/un+l, we have qi = Xui-1 = Ui-1/Un+l (i = 3 ..... n). T H E O R E M 4. The maximum cost of a contrast Huffman code of size n is Tmax(n) = 2 + (u n - 3)/Un+ 1. Proof Since the maximum of S(Q, C) is achieved for A/ = 0 (i = 3,... ,n), we obtain from (15) Tmax(n)

=

(Qmax,C) - -

S

U n + 3 - - 3 __ U n + l -t- U n + 2 - - 3

Un+l

2Un+l-Jr-un--3

__--

Un+l

= 2 + ~ n u- 3

Hn+l

~n+l

limn_,o. Tmax(n) = (3 + , / ~ / 2 = 2.6180. Proof follows from the fact that unlun+ 1 "-> 2/(1 + x/~) = (~'ff-- 1)/2 as n --> ~ [4, p. 86]. Example 1. Contrast code and contrast maximizing sequence of size 6: COROLLARY.

C = C ( 6 ) = {00000, 00001, 0001, 001, 01, 1}, ( O~a~=Q~ax(6)---

1 13'

1 1 2 13' 13'

rmax(6) = 2 +

3 5 1 3 ' 13 '

=2

/ 13 '

I--'5" "

Example 2. Contrast code and contrast maximizing sequence of size 8:

C = C ( 8 ) = {0000000, 0000001, 000001, 00001, 0001, 001, 01, 1}, Q~.ax---Qm.~(S)---

34 ' 34 ' 3 4 " " 3 4 '

Tma~ (8)

2+

=

48~99 -- a

=

34 ' 3 4 ' 18

2-~--

=

2

~7

34'

3-4" '

.

Let us compare the results of this paper with [1]. Condition (3) may be called normalization condition and sequences satisfying conditions (1)-(3) may be called normalized. In [1] we have considered sequences that satisfy conditions (1), (2) and the condition Pl = 1 (instead of condition (3)). Such sequences may be called integer sequences. By Theorem 3 of this paper, a maximizing normalized contrast sequence is a normalized Fibonacci sequence right"shifted" by one element, while the minimizing sequence in Theorem 3 of [1] is an integer Fibonacci sequence right-"shifted" by one element 1, ui, u2. . . . . un-x.

(16)

Thus, when the sequence (16) is normalized, i.e., each of its elements is divided by the sum of all elements (equal to Un+l) , its extremal properties are reversed. Specifically, if the sequence (16) is minimizing for a contrast Huffman code, then the sequence 1 Un+l

/'/t '

Unnt-I '

U2 tJn-bl ' ' " '

Un--1 Un-}-I

is maximizing (each in its class of sequences).

333

REFERENCES .

2. ,

4.

334

A. B. Vinokur, "Huffman trees and Fibonacci numbers," Kibernetika, No. 6, 9-12 (1986). D. A. Huffman, "A method for the construction of minimum redundancy codes," Proc. IRE, 40, 1098-1101 (Sept. 1952). D. Knuth, The Art of Computer Programming, Vol. 1, Basic Algorithms [Russian translation], Mir, Moscow (1976). N. N. Vorob'ev, Fibonacci Numbers [in Russian], Nauka, Moscow (1984).