Block Coding for an Ergodic Sourtie Relative to a Zero ... - IEEE Xplore

0 downloads 0 Views 1MB Size Report
[9] A. N. Kohnogorov, “On the algorithmic theory of information,”. Lecture, Int. Symp. Inform. ... Abstnref--An effective rate for block coding of a stationary ergodic.
IEEE TRANSACTIONS ON~NFORMAT~ONTHXORY,VOL.IT-u,NO.

432 REFERENCES

[1] R. J. Solomonoff, “A formal theory of inductive inference,” Znform. and Contr., pp. l-22, Mar. 1964, and pp. 224-254, June 1964. [2] D. G. Willis, “Computational complexity and probability constructions,” J. Ass. Comput. Mach., pp. 241-259, Apr. 1970. [3] G. J. Chaitin, “A theory of program size formally identical to information theory,” J. Conput. Mach., vol. 22, no. 3, pp. 329-340, (July 1975). [4] S. K. Leung-Yan-Cheong and T. M. Cover, “Some inequalities between Shr&on entropy and Kolmogorov, Chaitin and extension comnlexities,” Tech. Reo. 16. Statistics Dem.. * , Stanford Univ.. Stanford, CA, 1975. ’ [5] T. M. Cover, “Universal gambling schemes and the complexity measures of Kohnogorov and Chaitin,” Rep. 12, Statistics Dept., Stanford Univ., Stanford, CA, 1974. [6] R. J. Solomonoff, “Inductive inference research status,” RTB-154; Rockford Research Inst., July 1967. [7] J. Koplowitz, “On countably infinite hypothesis testing,” presented at IEEE Sym. Inform. Theory, Cornell Univ., Oct. 1977. [8] A. K. Zvonkin, and L. A. Levin, “The complexity of finite objects

[9]

[lo] [ 1 I]

[12] [13] [14]

4,JULY 1978

and the development of the concepts of information and randomness by means of the theory of algorithms,” Russ. Math. SWVS., vol. 25, no. 6, pp. 83-124, 1970. A. N. Kohnogorov, “On the algorithmic theory of information,” Lecture, Int. Symp. Inform. Theory, San Remo, Italy, Sept. 15, 1967. (Example given is from the lecture notes of J. J. Bussgang. Kolmogorov’s paper, “Logical basis for information theory and probability theory,” IEEE Trans. Inform. Theory, vol. IT-14, no. 5, Sept. 1968, pp. 662-664, was based on this lecture, but did not include this examnle.) A. N. Kohnogordv, humiztiom of the Theory of Probability. New York: Chelsea. 1950. B. D. Kurtz, and P. E. Gaines, ‘The recursive indentification of stochastic systems using an automaton with slowly growing memory,” presented at IEEE Sym. Inform. Theory, Cornell Univ., Oct. 1977. T. M. Cover, “On the determination of the irrationality of the mean of a random variable,” Ann. Stafis., vol. 1, no. 5, pp. 862-871, 1973. T. L. Fine, Personal correspondence. K. L. Schubert, “Predictability and randomness,” Tech. Rep. TR 77-2, Dept. of Computer Science, Univ. Alberta, AB, Canada, Sept. 1977.

Block Coding for an Ergodic Sourtie Relative to a Zero-One Valued Fidelity Criterion JOHN C. KIEFFER

Abstnref--An effective rate for block coding of a stationary ergodic soorce relative to a zero-one valued fidelity criterion is defined. Under some mild restrictions, a soorce coding theorem and converse are given that show that the defined rate is optfmom. Several examples are given that satlsfy the restrictIons imposed. A new generalization of the Sbannon-McMillan Theorem is employed.

Suppose for each n = 1,2, - * * , we are given a jointly measurable distortion measure p, : A n x A n+[O, 00). We wish to block code p with respect to the fidelity criterion F= bn)T.= 1’ Most of the results about block coding a source require a single letter fidelity criterion [ 1, p. 201.An exception is the case of noiseless coding [2, Theorem 3.1.11.In this case, we have p,(x,u) = 0 if x =y and p,(x,y) I. INTRODUCTION = 1 if x#u. In this paper we consider a generalization of noiseless coding, where we require each distortion ET (A, %) be a measurable space.A will serve as the measure p,, in F to be zero-one uulued; that is, zero and alphabet for our source. For n = 1,2, - - * (A “, Fn) will one are the only possible values of p, allowed. Such a denote the measurable space consisting of A “, the set of fidelity criterion F we will call a zero-one valued fidelity all sequences(x1,x2; * . ,x,) of length n from A, and ‘$,, criterion. the usual product u-field. (A O”,Fm) will denote the space We will impose throughout the paper the following consisting of A”, the set of all infinite sequences restriction on our zero-one valued fidelity criterion F= (XI,.&. * * ) from A, and the usual product u-field Tm. Let {Pn>* TA :A”+Am be the shift transformation TA(x1,x2; - -)= RI : If p,(x,y) = 0 and pn(x’,y’) = 0, then ( x2,x3;. .). W e d ef’me our source ,u to be a probability measure on A”, which is stationary and ergodic with P,+~((x,x’),(Y,Y’))=~, m ,n= 62; - -. respect to TA. In the preceding, we mean (x,x’) to represent the sequence of length m+ n obtained by writing first the Manuscript received February 14, 1977; revised November 1, 1977. terms of x, then the terms of x’. Equivalently, R 1 says The author is with the Department of Mathematics, University of P,+,((x, ~‘1, (u/N ( P,,&v> + P&‘,Y’). R 1 is a conMissouri, Rolla, MO 65401

L

00%9448/78/0700-432$00.75 01978 IEEE

433

KIEPFER: BLOCK CODING FOR ERGODIC SOURCE

sistency requirement on the p,,. Some such requirement is needed to investigate the behavior of block codes on sequencesof length n as n-co. R 1 is a mild restriction because there are many examples satisfying it. Here are a few. Example I: p,(x,y)= 0 if and only if x =y. This is the case of noiselesscoding mentioned above. - . ,x,>, (y1,y2,- - . ,Y,J)=O if and Exanzple 2: P,&~,x~; only if there is a permutation r of { 1,2,. - . , n} such that C5,~l~,x,~2~, * - - J,~,,)= (.Y~,Y~,- - - a,).

Sucha classof dis-

tortion measurescould be used in case the coder wishes to communicate to some receiver only the number of times each element of A appears in the sequence(x1,x2, - * * ,x,). Example 3: Let L be a finite set. Let A be the set of all finite words that can be formed using letters from L. That is, A={l,l,-.* In:I~;~~,/,,EL, n=1,2;**}.If X1,$,’ * *9 x,, are any words in A, let x,x2. * - x,, be the word in A formed by writing first the letters of x,, then the letters of x2, etc. Define pn((xI,x2; - - ,x,), (y,,y2,. - - JJ,)) = 0 if and only if x1x2- . * x, =y,y2+ . . y,. The coder may want to use this fidelity criterion if he is coding the source for transmission over a channel for which spacing of the words is not possible. Example 4: Let d be a metric on A. Let d, be the metric on A” such that d,((xl,x2; * - ,x,), (y,,y2; *. ,y,J) d(x,,y,). Fix E >O. Define p,(x,y)=O if and =maxl 0. Define a zero-one valued fidelity criterion {p,} as in Example 5: p,(x,y)=O if and only if d,(x,y) < D. II.

DEFINITION

OF THE RATE OF THE SOURCE

Let F- {p,} be the given zero-one valued fidelity criterion satisfying R 1. For each n, let r, be the relation on A” such that xrny if and only if p,(x,y)=O. If y E A”, define ~,J~]={xEA”:x~,J}. If XEA”, define r;‘[x]={y~ A n : xr,y }. We define a set E c A ’ to be r,-admissible if E c r,[r] for somey EA”. By a partition ‘9 of A”, we will mean a countable partition of A” into measurable sets. If x E A”, let 9 [x] denote the set of 9 containing x. We call ?i’r,-admissible if every set in 9 is r!-admissible. Let X” : Am-A” be the projection X”(x,,x2,. . . ) = ( X1,X2,’ * *, x,). Let p,, be the measure induced on A” by p;

that is, A(E)= p(X” E E), E E $$. Define the entropy H,(q) of the partition 9 of A” to be H,(9)= -ZEE9 pn(E) In h(E). We now add an additional restriction that will be required from now on. R2: There is a r,-admissible partition 9 of A such that H,(9) < co. (If A is finite, R2 reduces to the requirement that for each x E A, there exists a y E A with pl(x,y) = 0.) We define the rate RF(p) of the source p with respect to the fidelity criterion F to be RF(p) = inf, inf,” n -‘H,(??,,), where inf, denotes that the infimum is to be taken only over rn-admissible partitions ??nof A”. Certainly RF(p) < cc by R2. R 1 implies that if ?!Ynand 9,,, are, respectively, r,,- and r,-admissible, then qp, X Tp, is r,+,-admissible. This implies that the sequence{inf, H,(‘??‘,)} is subadditive. (A sequence{ xn} is subadditivi if x~+~ < x,, + x,, m, n = 1,2; f * . If { xn} is a nonnegative subadditive sequence,then lim,,, n - lx,, exists and equals inf, x,. See [2, p. 1121.)Hence RF(p) =lim,,,n-’ infTn H,(?!?‘,). As examples, RF(p) in Example 1 is the entropy of the source. In Example 2, it is not hard to see that for finite A, RF( p)=O. In Example 4, it is the absolute e-entropy 1, defined in [4], except that a r,-admissible set has a radius no greater than e in the d,-metric. (In [4], sets of diameter no greater than E are used.) In Example 5, RF(p) turns out to be R,(D), the rate-distortion function of p. relative to the single-letter fidelity criterion {d,}, evaluated at D. In Example 6, RF(y) turns out to be R,(D), the rate distortion function of p, relative to the subadditive fidelity criterion {d,}, evaluated at D. (This rate distortion function is a generalization of the usual rate distortion function for a single-letter fidelity criterion; for its definition, see [13].) The formula we have given for RF(p) is not efficient for computing RF(p), since it involves for each n the determination of the r,-admissible partitions. Later, in Section IV, we will give a formula for RF(p) which for each n involves the minimization of a convex function over a convex set. The main theorems (Theorems l-3) of this paper will show that RF(p) is the optimum rate for block coding the source p relative to F. Following [ 11,a block code on sequencesof length n is a finite set B c A n. The rate of the block code is n -’ In] B 1, where ]BI denotes the cardinality of B. If x E A”, define p,,(x]B)=inf,,, p,,(x,y). Each x E A n is coded into a y E A” such that p,(x,u) =p,(xlB). The average distortion p,(B,p) resulting from using the block code B to code the source p is j5JB,y)=/Amp,(XnlB) dp. III.

THE MAIN THEOREMS

We quote the main theorems to be proved. The proofs will be given in Appendix C. Theorem I: Let the zero-one valued fidelity criterion F= {p,} and the stationary ergodic source y satisfy R l-R2. Then for each it, there exists a block code B,, on

434

IEEE TRANSACTIONS ON INFORMATION

THEORY, VOL. IT-%,

NO.

4, JULY 1978

ble over the DMC if for any e > 0, it is possible for each n to choose encoders and decoders so that Pr [(U,, * * * , O,( VI, * * * 3V,)] > 1 - E for n sufficiently large. The following is a weak converse to Theorem 1. (Thus we are using F to measure the fidelity with which Theorem 2: Let F and p be as in Theorem 1. Let the channel output matches the source.) One can easily show that if RF(p) < C, then p is F-transmissible, whereas R0. sequences of length 12 such that lim ,,, &t p) and limn+m P,(B,,d=O.

n - ’ In ]B,,I =

By adding another assumption, we obtain the following strong converse to Theorem 1. Theorem 3: Let F and p be as in Theorem 1. Suppose that the following additional restriction is also satisfied. R3: {r,[ul:~~A”} is a partition of A”, n=1,2;**. Let R < RF( p). Let { nk} r= i be an increasing sequence of positive integers. For each k, let Bk be a block code on sequences of length n, satisfying nil In IB,J RF( p); it turns out that in most cases Ri( p) > RF( FL).)It is not clear whether these results can be generalized to sources with memory, since the techniques used by Kiimer and Pinkston depend heavily on the memoryless character of the source. It may be possible to obtain Theorems 1 and 2 for stationary nonergodic sources, except that RF(p) would have to be defined differently for the nonergodic case. Parthasarathy [6] and Winkelbauer [7] have done this for the noiseless case. Also, using a zero-one valued fidelity criterion, an easy generalization of the standard joint source-channel coding theorem can be obtained. Let F= {p,} be a fixed zero-one valued fidelity criterion. Suppose we wish to block code the ergodic source p (with alphabet A) for transmission over a DMC with capacity C. Let (U,, U,, * * - , U,) be the ensemble of n source letters generated by p. Using some encoder, we code (U,, * * . , U,) into an ensemble . . a,X,) of n channel input letters. We transmit ;;;. . . , X,) over the DMC, getting an ensemble (Y,,. * * 9Y,) of n channel output letters. Using some decoder, we decode (Y,, - * - , Y,) into an ensemble tv,,. * * 3V,) of n letters from A. We say y is F-transmissi-

IV.

CALCULATION

OF

RF(p)

The rate-distortion function for coding a source relative to a single-letter fidelity criterion can be expressed by means of a formula involving an information theoretic m inimization [ 1, p. 1051,and so can the absolute e-entropy Z, considered in [4]. We give here such a formula for M 1-4 Let q( * ] *) be a transition probability on A “. That is, for each x E A”, q(- Ix) is a probability measure on A”, and if E E F,, q(EI-) is rn- measurable. Let q,, be the collection of all transition probabilities q on A” such that q(T,-‘[x11x) = 1, x E A n. If q E gn, let w be the probability measure on (A” X A “, %nX Fn) such that pq(E X F) = & q(FIx) d&(x), E, F E Tn. If U,, V,, are, respectively, the projections (x,y)+x and (x,y)--ty from A” x A” to A”, let Z, ( pq) = Z( U,, V,), the mutual information of the pair (U,, V,), calculated with respect to w [8, p. 91. Define Rj!( p)‘inf, inf,,,” n-l Z,,(w). We will prove the following theorem in Appendix C. Theorem 4: R:(p) = RF( p). This will give us the type of formula we want for RF(p). One can use R 1 and Lemma 1 (to follow in Appendix B) to show that {inf,,,” Z,, (w)} is a subadditive sequence. Hence R:(p)=lim,,, n-l inf,,,n Z,(w). We also remark that Rjf( p) < RF( FL)follows easily. For if 9 is r,-admissible, pick q E qp, as follows: if E E 9, pickyEA” so that Ec~,[y]. Define q({y}]x)=l, xEE. Then Z,(w) < H,(q)). R;(p) < RF(p) follows from this. ACKNOWLEDGMENT

The author would like to thank Professors Toby Berger, M ichael Pursley, James Dunham, and the referees for their helpful comments. APPENDIX A To obtain Theoremsl-2 only the Shannon-McMillan Theorem is required.The proof of Theorem 3, however,requiresa new generalizationof the Shannon-McMillan Theorem(Theorem 5, below), which is of interest in its own right. First some notation. Suppose(0, % ,P) is a probability space.If X is a measurablefunction on 0 with countablerange,let P(X) be the function on Q.suchthat P(X) (a)= P(X=X(w)), wea. Let h(X) be the function h(X)= -In P(X). H(X)= E[h(x)] is then the entropy of X. If Y is a secondmeasurablefunction, let P(X] Y) be the function such that P(X] Y) (o)= P(X= X(o)] Y) (o), oE 3. We define h(X] Y) = -In P(X] Y), and so H(X] Y) = E[h(X] Y)] is the conditional entropy of X given Y. If Xi, x,; . . ,X,, are measurablefunctions, (X,, X,; -. ,X,,) is the measurable function (Xi, X1,. . . , X,) (w) = (X,(o), X,(o), * * * ,X,(w)), o E 8. Hence, by the symbol

435

KIEFPER: BLOCK CODING FOR ERGODIC SOURCE

h(Xt, X,, . . . , X,] Y,, Y,, . . . , Y,), for example, we mean h((X,,X*,* * * ,X,)l(Y,, y,,* *. , Yk)). If X, Y are measurable functions on 8, let i(X, Y) denote the information density of the pair (X, Y); Z(X, Y)= E[i(X, Y)] is then the mutual information of the pair (X, Y) [8, Chap. 21. Theorem 5 (Generalized Shannon-McMillan Theorem): Let (Q,%,P) be a probability space. Let T:S2+!ll be a measure preserving map. For each n = 1, 2, * * * , let Y,, be a measurable function on Q with countable range. Suppose Y,,,,,, is a function of (Y,, Y;T”‘), m,n= 1,2;. . where (Y; Tm)(x) = Y,( T’?). Let ZZ(Y,) < 00. Then there exists a T-invariant function f in L’(P) such that {n-‘h(Y,)} has limitfin L’(P) norm as n+co.

{n- ‘h( Y,)} is a uniformly integrable sequence, we have by [9, Theorem 7.5.41 that { n-‘h( Y,)} converges in L’(P) norm to some f EL’(P). Wenow show thatf=fT. Sincen-‘h(Y,,-,.T)+fTin L’(P) norm, it suffices to show that n-‘h( Y,)-n-‘h( Y,-,.T)-+O in L’(P) norm. Let ]I. I] denote the L’(P) norm. We have

Ilh(Yn)-WY,-rT)II G Ilh(Yd-NY,, Yn-rT)II +llh(Y,,Y,-,.T)-h(Y,-,.T)II. Now h(Y,J < h(Y1, Y,- ,.T) Y,-,*T).

Also h(Y,-,.T)

Suggest Documents