Finite-state adaptive block to variable-length noiseless ... - IEEE Xplore

1 downloads 0 Views 406KB Size Report
finite-state adaptive block to variable-length coding schemes is the largest entropy rate among those processes in the stationary hull. I. INTRODUCTION.
IEEE 1RANSALTIONS ON INFORMATION THEORY, VOL

35, NO 6, NOVEMBER 1989

1259

Finite-State Adaptive Block to Variable-Length Noiseless Coding of a Nonstationary Information Source JOHN

c. KIEFFER, SENIOR MEMBER, IEEE

Ahsfracf -We associate with a given nonstationary finite-alphabet information source a certain class of stationary processes which we term the stationary hull of the given source. It is then shown that the optimum average rate at which the given source can be noiselessly coded via finite-state adaptive block to variable-length coding schemes is the largest entropy rate among those processes in the stationary hull.

I. INTRODUCTION

WVo3

E ARE GIVEN an information source X = Xl , . . . ) consisting of random variables X ,

which all take values in a finite set A (the source alphabet). We wish to transmit the source to some user and are constrained to transmit over a fixed noiseless channel whose input alphabet is the finite set B. We shall encode the source for transmission over the channel and decode the channel output sequence via an adaptive block to variable-length coding scheme. The first part of this introduction details what we mean by an adaptive block to variable-length coding scheme. We shall then discuss the nature of the main result obtained in the subsequent sections. For some positive integer N, we first block the source sequence into its nonoverlapping blocks of length N , which we label . . . . (Thus X I = (X,,; . ., X N l t N -'), 120.) We then encode each random block XI via a one-to-one mapping into a random codeword Y, over the alphabet B which comes from a set of codewords satisfying the prefix condition. This variable-length encoding procedure is adaptive from block to block in the sense that the mapping used to encode XI into r; depends on the The terms previously observed values of Xo, XI; . ., of the sequence (Yo, Y,, . . . ) are then concatenated to obtain a sequence from B which we will denote by YoY, . . . . This sequence is transmitted over the noiseless channel to the user, who uses a decoding algorithm to recover the source sequence X. (First of all, the user knows the prefix set of words from which Yo was chosen and can

therefore determine Yo from YoYl. . . . He can then obtain 2, from Yo,as the mapping taking into Yo is one-to-one. Having determined go,he now knows the mapping which was used to carry X, into Y, and therefore the prefix set of words from which Y, was chosen. This allows the user to determine Y , and then 2,.Proceeding in this way, he determines 2, for every i.) This adaptive block to variable-length coding scheme is shown in Fig. 1. The coding scheme described above depends on the block length N . To emphasize this, we say that the scheme is Nth order. Let %, be the family of all Nth order adaptive block to variable-length coding schemes, The rate rx( T ) at which a scheme T E V Ncodes our given source X is defined by

xo

associated where L ( Y , ) is the length of the codeword with the block (X,,,. . ., X N l t N - ' ) . The optimum rate at which X can be noiselessly coded via adaptive block to variable-length coding schemes is then

xo,z,,

Manuscript received July 20. 19x7. Ths work was supported in part by the National Science Foundation under Grants ECS-8300973, ECS850168, ECS-8796270, and NCR-8702176. The author is with the Department of Electrical Engineering, University of Minnesota. 200 Union Street SE.. Minneapolis, MN 55455. IEEE Log Number X9311XX.

inf inf{ ~

%,}.

~ ( 7T )E:

N

(1.I?

This optimum rate is given by the number Iimsup n-'H( XI; . . , XI,) I1

-+

x

4

(1

where the logarithm used in the calculation of the joint entropy H ( XI,. . . , X,?) is to base equal to the number of elements of B (as are all logarithms in this paper). One can give an easy proof of this fact based on material from [2, ch. 31. First, observe that for any scheme from V, N - ' E [ L ( Y,) I?, >N

= -Yo,.

-

. . , X,-1 = 2,- I ]

' H ( X I ~ X , = - Y , ; . ~ , X , ~ , = ~(1.3) ,~,),

which, upon taking the expected value over the values To; . . , x , _ and summing yields,

0018-9448/89/1100-1259$01.00 01989 IEEE

1260

IEEE TRANSACTIONS O N INFORMATION THEORY, VOL * X0'

..

xl,.

source

*

blocking procedure

XO'X1r..

-xo.x-l,...

r YoY

4

- -

decoder

.>

,

YO.Y1

.

e . .

encoder

inverse of blocking procedure

XO,X1,

Y Y

3 concatenator

O 1

...

35, NO. 6, NOVEMBER 1989

3

... >destination c

(b) Fig. 1. Adaptive block to variable-length coding scheme. (a) Encoding part of coding scheme. (b) Decoding part.

From this, we see that (1.2) is a lower bound to (1.1). Conversely, - for a fixed N, given the values E,, * .,XI- of X,; * * , X,-,, respectively, one can encode X , in a one-toone way into a codeword from a prefix set of codewords so that the left side of (1.3) is within 1/N of the right side. Doing this for each i , we obtain a scheme in V N for which the left side of (1.4) is within 1/N of the right side, whence, the scheme has a rate within 1/N of (1.2). Since by choice of N we can make 1/N as small as we like, this completes the proof that (1.1) and (1.2) must be equal. Remark: Neuhoff and Gilbert [6, theorems 1 and 21 showed that this quantity (1.2) is the optimum rate for noiseless coding of a source via coding schemes from a larger class than that of adaptive block to variable-length schemes. (See also [5].) The adaptive block to variable-length schemes which yield a rate close to (1.2) may be impractical in that they impose too large a memory requirement for implementamay depend on the tion. (The mapping used to encode entire sequence ( ., J.) Thus as do Ziv and Lempel in [8], we require that our coding schemes be finite-state. We define what we mean by this in the next section-for the moment the reader should think of the finite-state adaptive block to variable-length coding schemes as being the practically implementable adaptive block to variablelength schemes, without a precise definition at this point to clutter matters, thereby allowing us to state in this section the nature of the result that will be obtained. The purpose of this paper is to find an expression for the optimum rate at which the given source X can be coded via finite-state adaptive block to variable-length coding schemes. (This means computing the number we get by imposing in the expression (1.1) the requirement that I- G V N be finite-state.) Our main result is that this optimum rate is equal to the supremum of the entropy rates of the processes in the stationary hull of X . Introduced in [4], the stationary hull A( X ) of the source X is a certain class of stationary processes that we associate with the source. We give here a short discussion of this concept. The stationary hull A ( X ) of X is the class of all processes Z = (Z,, Z,, . . . ) with alphabet A such that for some sequence of positive integers no < n , < . . . ,

x,,

,

I,,

-1

x,

for every real-valued function f defined for sequences from A that depends on only finitely many coordinates. Every process in A( X ) is necessarily stationary-hence the terminology stationaly hull. If X is stationary, then A( X ) consists of X alone. More generally, if X is asymptotically stationary in the sense that 1 1 1 - 1

lim 11-00

c E [ f ( & X , + ,,...)I

-

n

,=0

exists for every f depending on only finite many coordinates, then A ( X ) also contains exactly one stationary process. If the source X is not asymptotically stationary in the sense just described, then its stationary hull A(X) is infinite. Occasionally, one encounters an information source of practical interest that is of this complex type. Section I1 of the paper gives some preliminary material needed for the proof of our main result. In Section 111, the main result is formally stated and proved. 11. DEFINITIONS AND NOTATIONS In what follows, let D m be the measurable space of all sequences ( d o ,d,, * . . ) from a finite set D, with the usual product sigma-field. If x is an element of a space D m (or a process with finite alphabet, by which we mean a random sequence with values in a space of type Ow), let x , denote the ith coordinate of x ( i 2 0), and let x,'") denote the x,+,-,)for each positive integer n. vector ( x , ; Recall from Section I that A is the alphabet of our given source X and B is the alphabet of our given noiseless channel. Let B* be the set of all finite words from B , including the null word X of length 0, and for each positive integer N, let g Nbe the set of all one-to-one maps from A N to B* which map A N into a prefix set of words. If Z is a stationary process with finite alphabet, let H ( Z ) denote the entropy rate of 2, given by a ,

-

H ( Z ) = lim ~ - ' H ( Z ~ ; . - , Z , - , ) . I,

m

We define a finite-state adaptive block to variable-length coding scheme as a quadruple ( N , C, s*, F, a),where N is a positive integer, C is a finite set, s* is an element of C , F is a mapping from C x A N into C , and a is a family {as: s E C} of mappings from g N . An explanation is in order concerning the way in which a scheme (N, C, s*, F, a) encodes a sequence (xo,. * . )

1261

KIEFIt,K: NOIStl.I.SS CODING OF NONSTATIONAKY INFOPJMATION-SOURCE

from A" for transmission over the noiseless channel, and then decodes the channel output to recover this sequence. First, the sequence is blocked into the N-blocks 2,= xhy), i 2 0. Then, the state sequence (so, sl,. . . ) is generated where s o = s * and SItl

=

F ( s , , x,),

E

Proof: Let R = sup { H ( Z ) : Z E A( X ) } . For the given > 0, we find a scheme 7 of the indicated type such that

i 2 0.

The channel input sequence from B" is then Y o Y l . . . , where y, = a>,(?,),

Moreover, the scheme T can be chosen to be of the simple form r = ( N , C, s*, F, a ) where a, is the same for all s E C.

rX(T)

(3.1)

By Lemma A2 of the Appendix, there exists a positive integer t and I/I E Bt such that

i 2 0.

The original sequence (x,,, xl, . . . ) can then be recovered from the channel output sequence yoyl . . . as indicated in Section I. The rate r x ( 7 ) at which a finite-state adaptive block to variable-length coding scheme T codes the source X is defined exactly as in Section I (where adaptive block to variable-length coding schemes were considered which need not necessarily be finite-state). The optimum rate rx at which the source X can be coded via finite-state adaptive block to variable-length coding schemes is defined to be the infimum of vX( T ) over all finite-state adaptive block to variable-length schemes T . We are now ready for our main result.

t and a 9YN such that

@I E

i2O.

(3.3)

Let T = ( N , C , *, F , a ) be the finite-state adaptive block to variable-length coding scheme in which C is a singleton { s * } and a,* = @. Then the rate rX(T ) at which T codes X is given by

111. THEMAINRESULT

Recall that X denotes our given source which we wish to noiselessly code via finite-state adaptive block to variablelength coding schemes. The main result of the paper is the following theorem.

We then obtain

Theorem I : r , = s u p { H ( Z ) : Z E A ( X ) } .

This result follows from the coding theorem (Theorem 2) and its converse (Theorem 3) given below. Remarks: a) The set A*(X) consisting of the distributions of the processes in A(X) is compact under the topology of weak convergence of measures, and the entropy function is an upper-semicontinuous function on A*(X). Hence it follows that there must be a process in A( X ) whose entropy rate achieves the supremum in Theorem 1. A classic result of Shannon (see [7, pt. I, sec. 91 or [2, theorem 3.5.21) states that the optimum rate at which a stationary source can be noiselessly coded is equal to the entropy rate of the source. Hence Theorem 1 says that, if we think of the processes in A ( X ) as stationary sources, the optimum rate at which the source X can be noiselessly coded is equal to the optimum rate at which a source in A( X ) of maximal entropy can be noiselessly coded. b) Take X to be an individual sequence. T h s means that Pr[ X = x*]= 1 for an element x* of A". Then, the expression sup { H ( Z ) : Z E A ( X ) } is the same as the quantity p ( x*) analyzed by Ziv and Lempel in [SI. Theorem 2: Given any e > 0, there exists a finite-state adaptive block to variable-length coding scheme T such that r X ( 7 )_ < E +sup { H ( Z ) : Z E A ( X ) } .

applying (3.3) to (3.4). I t follows that r X (T ) I e/?-+

E L ( I/I ( Z P

1)

t

(3.6)

for some Z E A( X ) , since the right sides of (3.5) and (3.6) are equal for some Z E A ( X ) . Applying (3.2) to (3.6), we get the desired (3.1). Theorem 3: If T is any finite-state adaptive block to variable-length coding scheme, then rX(T)

2

sup { H ( Z ) : Z E A ( X ) } .

Proof: Fix Z E A ( X ) . For the given scheme suffices to show that

T,

it

(3.7)

rX(T)>H(Z).

Pick the positive integer N , finite set C, function F: C x A N - + C , maps { a , : s E C } from BW,and processes ( S , : i 2 0). (Y,: i 2 0) so that S,+,=F(S,,Xh:)), = as,(

r y (T

) =

-

x,$:)),

limsupn-' If

x

I =

0

i>O

i20

1262

IELE TRANSACTIONSON INFORMATION THEORY, VOL.

-

Define the set C* = C x (0; . . , N - 1) and the functions F*: C* X A" C*, G: C* X A N + B* by

F*((c,i),x) =(c,i-I),

i>O

i>O

G((c,O),x) = a , . ( x ) .

Let (