random by the given source. In this introduction, we shall review the traditional ex- pected performance results of source coding theory. We shall follow this with ...
263
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL., 37, NO. 2, MARCH 1991
Sample Converses in Source Coding Theory John c. Kieffer, Senior Member, IEEE
Abstract -A study is made of the &e and distortion performance of a squence of codes alohg a sample sequence of symbols generated by a stationary ergodic information sourcg. Two results are obtained 1) The source sample sequence is encoded by an arbitrary sequence of block codes which operate at a rued rate level R, and a sample converse is obhined which stateb that. with Drobabilitv one. the lower limit of the code sample distortions is lower-bodnded- by d(R),the value of the distortion-ratefunction at R ; 2) the source sample sequence is encoded by an arbitrary sequence of varkble-dte codes which operate at a fixed distortion level D , and a sample converse is obtained which states that, with probability one, the lower lihit of the code sample rates is lower bounded by R ( D ) ; the value of fbe rate distortion function at D . A powerful new ergodic theorem Is USed to obtain both sample converses. Index T e m -Block mance of codes.
and variable-rate source coding, sample perfor-
s,+,ols from A ; we denote this random sequence of symbols by xi, x 2 * * * ' * For each positive integer n9we denote the random Source block (xi,x2,' ' X n ) bY x",and we shall use LLI tu denote the Urobability distribution Of x" on ' 9
( A " .d")' "
-iefiiit
blocks {xn) 'discuss encoding of the rate level. - Then* . we discuss codes that Opefate at a codes that oberate at a fixed dlstortlon level. We employ block codes when coding at a fixed rate level. If is a Dositive integer. a block code of order is defined to be a ionempty finite subset of 2. A block code B of order n is said to operate at the rate level R if the number of elements IBI in B satigfies IBI I2"R. The sample distortion p ( B ) of a block code B of order n is the random variable defined by p ( B ) 2 minYEBp,(X",y). Given a sequence of block codes {Bn}, each of which operates at a fixed rate level R > 0, one would like to analyze the behavior of the sequence {p(B,)} as n -+W. This behavior is related to the distortion rate function of the source p relative to the fidelity criterion { p n } , provided we impose the following two requirements.
I. INTRODUCTION N THE TRADITIONAL APPROACH to sourkt coding theory, one analy2es the expected rate or distof'tion performance of a seqlience of codes used to encode a given information source. The expected performance is computed over the ensemble of all sequences that can be generated by the source. In tHis paper, we concern ourselves with the alternative approach in which one analyzes the code se1) If n, m ar? positive integers and x1E A", y , E 2, x2 E quence performance along a sample sequence generated at Am,y , E Am,then random by the given source. In this introduction, we shall review the traditional exP n +m( ( ~ 1 9 x 2 ) ~ 2 ) ) pected performance results of source coding theory. We shall follow this with an indication of the sample performance results that can be obtained. Later sections are devoted to proofs of sample performance results. 2) There exists a* E such that E [ p l ( X l , a * ) ] < m . First, we present some notation that will be in effect throughdbt the paper. We fix a measurable s p a y ( p ,d )as Requirement 1) is a condition on the fidelity critefion { p n ) ; a our source alphabet and a measurable space ( A , & ) as our fidelity criterion satisfying 1) is said to be subadditive. Rereproduction alphabet. To avoid measure-theoretic patholo- quirement 2) is a condition expressing how the fidelity critegies, w,e assume that ,(?I E d for each x E A and that rion {p,,} and the source p are related to one another. { y ) E d for each y E A. Fof each integer n 2 1, we form the Assuming that 1) and 2) hold, we define the distortion rate measurable space (A",&") in which A" is the set of all function of the source p relative to the fidelity criterion { p , ) n-tuples (xl, x 2 ; * x n ) from A, and d" is the sigma-field to be tht function D(.) on (0, m) given by D(R ) 2 inf, D,(R), of subsets of A" generated by n-fold Cartesian products of R > 0, -where D,,(R) is the infimum of E[p,(U,V ) ]over all seis f;om @, similarly, we form the measurable spaces A" x A" valued random pairs (U, V ) in which U has the (A",d"), n 2 1. same distribution as X " and I(U; V )I nR. (The logarithm We also fix a fidelity criterion { p n : n = 1,2, } in which used in the computation of the mutual information I ( U ; V ) is each p n is a nonnegative measurable functlFn from the to base two, as shall be all logarithms in this paper.) d"X d") into the product measurable space (A" X 2, The following source coding theorem states what is known measurable space ( R , @(R)), where @(RI is the sigma-field about the expected distortion performance of sequences of of Bore1 subsets of the real line R. block codes that operate at a fixed rate level. Finally, we fix a stationary, ergodic source p with alphabet Proposition I : Suppose Restrictions 1) And 2) hold. Let ( A , & ) . The source p gederates a random sequence of R > 0. Then the following two statements hold.
I
9 (
~
1
9
A
e ,
Manuscript received May 22, 1989; revised August 6, 1990. This work was supported by NSF Grant NC#-8702176. The author is with the Department of Electrical Engineering, University of Minnesota, 200 Union Street SE, Minneapolis, MN 55455. IEEE Log Number 9040745.
a)
limE[ p (
B,,)] 2 D(R),
h-rw
whenever B, is a block code of order n that operates at the rate level R , n 2 1.
0018-9448/91/0300-0263$01.00 0 1991 IEEE
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991
264
-
b)
lim E [ p ( B , ) ] 5 D ( R ) , n +m
for some sequence {Bn}in which B,, is a block code of order n that operates at rate level R , n 2 1. Part b), the positive half of the coding theorem, is proved in [9], [4, Chapter 111. Part a), the converse half of the coding theorem, is a trivial consequence of the definition of distortion rate function. The following source coding theorem states what can be shown concerning the sample distortion performance of sequences of block codes that operate at a fixed rate level. Propsition 2: Suppose the Restrictions 1) and 2) hold. Let R > 0. Then the following two statements hold. lim p ( B , ) -
a)
2 D(R),
a.s.,
n +m
whenever B, is a block code of order n that operates at the rate level R , n 2 1.
-
b)
lim p ( B , )
I
D(R),
2') ?or each D > 0 there exists a countable subset {y,} of A and a countable measurable partition {Ei) of A such that p l ( x , y i ) s D, x E E, for each y,, and -CiPr[X, E E,]logPr[X, E Ei] < W .
Conditions 1) and 2) are appropriate for coding at a fixed rate level, whereas conditions 1') and 2') are the appropriate conditions for coding at a fixed distortion level. We point out that any subadditive fidelity criterion {p,} (i.e., any fidelity criterion satisfying 1)) satisfies 1'). However, there are fidelity criteria satisfying 1') that are not subadditive, e.g., a fidelity criterion {p,) defined by pn( ( x I , x z , . .. ,xn),
( Y I , Y ~ , . . . , Y4 ~ ) )m~
lsrsn
where p : A x 2+ [0, W) is a given measurable function. Assuming that 1') and 2') hold, we define the rate distortion function of the source p relative to the fidelity criterion {pn} to be the function R(.) on ( 0 , ~ given ) by R(D)BinfR,(D),
a.s.,
p(xi,Yi)t
D>O,
n
n +m
for some sequence {BJ in which B,, is a block code of order n that operates at rate level R, n 2 1.
where RJD) is the infimum of Z(U; V ) / n over all A" X 2 valued random pairs ( U , V ) in which U has the same distribution as X" and
(2) Pr[p,(U,V)I:D] = l . At first glance, it might seem that Proposition 2 could be obtained from Proposition 1 in some straightforward manRemark: If {p,} is subadditive, it can be shown that the ner, since Proposition 1 takes the form of Proposition 2 if the expectation operator "E" is removed from Proposition 1. rate distortion function previously defined coincides with the Indeed, Part b) of Proposition 2 can be obtained from Part alternative definition of rate distortion function in which one b) of Proposition 1 via the code construction technique keeps everything as above except that condition (2) is reoutlined in [6].However, we know of no easy way to deduce placed by the condition Part a) of Proposition 2 from Part a) of Proposition 1. (3) Hence, the result embodied in Part a) of Proposition 2 is a If {p,,} satisfies 1') but is not subadditive, these two rate seemingly nontrivial result. We shall refer to this result as the sample converse for source coding at a fued rate level. It distortion functions need not coincide, but the alternative rate distortion function defined using (3) will have no operashall be proved in a subsequent section of the paper. In this last part of the Introduction, we discuss encoding of tional significance in terms of coding at a fixed distortion the source blocks {X"} via codes that operate at a fixed level. The following source coding theorem states what is known distortion level, leading up to the statement of a sample about the expected rate performance of sequences of variconverse for source coding at a fixed distortion level. We employ Variable-rate codes when coding at a fixed able-rate codes that operate at a fixed distortion level. Proposition 3: Suppose the restrictions 1') and 2') hold. distortion level. If n is a positive integer, a variable-rate code of order n is defined io be a triple C = (4, S)where S is a Let D > 0. Then the following two statements hold. countable subset of A",+ is a measurable function from A" into S, and is a one-to-one function from S onto a set of finite binary words satisfying the prefix condition. An nthorder variable-rate code C = (4, S) is said to operate at the distortion level D if pn(x, 4 ( x ) ) I D, x E A". The Samwhenever C , is a variable-rate code of order n that operates at the distortion level D, n 2 1. ple rate function r ( C ) of a nth-order variable-rate code C = (4,+,S) is the random variable defined by r ( C ) length of +(&W)/n. b) lim E [ ~ ( cI, R) ]( D ) , Given a sequence of variable-rate codes {C,,}, each of n-m which operates at a fixed distortion level D > 0, one would for some sequence {Cn}in which C,, is a variable-rate like to analyze the behavior of the sequence {r(C,)} as code of order n that operates at distortion level D, n -+ m. This behavior is related to the rate distortion function n 2 1. of the source p relative to the fidelity criterion {p,,}, provided we impose the following two requirements. Part b), the direct half of Proposition 3, follows from 1') If n, m are eositive integers and x 1E A", y , E 2, x2 results in [5]. Part a), the converse half, is a trivial consequence of the definition of rate distortion function. E Am,y2 E Am,then The following source coding theorem states what can be p n + m ( ( X 1 7 ~ 2 ) ( Y, 7 Y,)) shown concerning the sample rate performance of sequences < m a [ pn( XI Y,) ~ m ( y~ z 2) l . of variable-rate codes that operate at a fixed distortion level.
+,
+
+,
3
9
7
265
KIEFFER: SAMPLE CONVERSES IN SOURCE CODING THEORY
Proposition 4: Suppose the restrictions 1') and 2') hold. Let D > 0. Then the following two statements hold. lim r(C,) 2 R ( D ) , -
a)
lim r(C,) I R ( D ) ,
a.s.,
n+m
for some sequence {C,} in which C , is a variable-rate code of order n that operates at distortion level D, n2l. Part b), the direct half of Proposition 4, can be obtained from Part b) of Proposition 3 via the code construction technique in [6].However, we know of no easy way to deduce Part a) of Proposition 4 from Part a) of Proposition 3. We shall refer to the seemingly nontrivial result embodied in Part a) of Proposition 4 as the sample conuerse for source coding at a fuced distortion level. There are two main results of this paper: the sample converse for source coding at a fixed rate level and the sample converse for source coding at a fixed distortion level. We shall prove the first of these sample converses in Section 11, and the second one in Section 111. Both proofs involve use of a powerful new ergodic theorem [71.
11. SAMPLE CONVERSE FOR CODING AT A FIXED RATELEVEL The purpose of this section is to prove the sample converse for source coding at a fixed rate level (Proposition 2, Part a)). We restate this sample converse as Theorem 1, for emphasis. Theorem I : Suppose the restrictions 1) and 2) hold. Let R > 0. Then
lim p( B,)
1 D( R ) ,
(4)
as.,
n +m
As indicated in the Introduction, we shall prove Theorem 1 by exploiting a new ergodic theorem 171. We first develop some background that shall enable us to state the new ergodic theorem. After we state the ergodic theorem, we shall conclude the section with the proof of Theorem 1. Defmition: Let F be a set of real valued functions that are each defined on the same set S. We say that F is log-conuex if, whenever f , g E F and 0 < A < 1, the function h is also in F, where ) =
-log
9 .
.
'9
xn+m)
f(
~2
9 .
* * >
xn)
+ g( xn+1). . *
9
xn+m)
for each ( x l , - .',x,+,)E An+m. We now state the ergodic theorem that we shall need. (It is a corollary of Theorem 2 of 171.) Proposition 5: For each integer n 2 1 , let F,, be a set of nonnegative measurable functions defined on A". We impose the following assumptions on the sets {F,}: a) Eacii F,, is nonempty, b) Each F,, is log-convex, c) E [ f ( X " ) l 0 to be any number such that - l / s lies between the right and left hand derivatives of D ( . ) at R, we have
+
R'> 0 . sD( R ' ) R ' Z sD( R ) + R , (8) We shall now be able to complete the proof of Theorem 1 by picking an appropriate sequence of sets (F,} to which Proposition 5 can be applied. For each n 2 1, we let F, be the set of all nonnegative measurable functions f on A" that satisfy E [ f ( X n ) ] < mand
f( x ) 2 -log
LE,
2-snpn(x.y)p(
y )),
x
E
A"
(9)
for some finite subset B of 2 and some probability measure p on B. It can be seen that assumptions a)-d) of Proposition 5 hold for the sets {F,,}. (Assumptions b) and c)
IEEE TRANSACTIONS ON INFO~MATION THEORY, VOL. 37, NO. 2, MARCH 1991
266
follow from the definition of the Fn's, assumption d) is easily A noheless c6de of grder n is a one-to-one function from A" checked using the subadditivity of {pJ, and a) is true be- onto a set of finite binary words obeying the prefix condition. cause, by 2), the function f ( x ) = p,(k,a*(n)), x E A", is in We may ap,ply Theorem 2 to obtain the sample converse for F,, where a*(n) denotes the n-tuple each of whose entries is noiseless source coding which states that a * . ) Letting f E F, satisfy (91, Lemma A2 of Appendix A 1 states that lim -[length of JIn( X")] 2 H , a.s+ (13)
-
n E [ ~ ( x " ) I2 mE[Pn(u,v)j + ~ ( u ; v ) (10) when5ver &,ib a noiseless code of order n (n 2 1). (Just take where ( U , V ) is a certain A" x valued random pair such A = A, p , = 0; then R ( D ) = H for D > 0, and so (12) yields that the distribution of U is the same as that of X". Applying (8) to (101, we obtain E[f(X")I2 n[sD(R)+ RI (13).) This special case of Theorem 2 is due to Barron [2, +OD
from which it follows that the number M defined by ( 5 ) satisfies M 2 sD( R ) + R . (11) To complete the proof of Theorem 1, define f,(n 2 1) to be the function on A" such that
f,(x)= -log
(IB,,I-'
C
1
2-S"Pn(X'y),
YE&
x
E A",
where B,, is the subset of A^. obtained by adjoining the n-tuple a * ( n ) to B,. Note that f J x ) Ilog lBnl+ snp,(x,u*(n)), $2E A". Hence, E[f,(X")] is finite and so f, E F,(h L 1). The following holds:
Theorem 3111, We are now ready to prove Theorem 2. Just as with the proof of TheQrem 1, we apply the ergodic theorem Proposition 5 to a cleverly chosen sequence of sets {Fn}satisfying asgumptions a)-d) of Proposition 5. We shall also employ for the proof of Theorem 2 three elementary lemmas, Lemmas Bl-B3 of Appendix B. Prdbf of Theorem 2: Fix D > 0 and a sequence {C,,} in whidh C, is an nth order variable-rate code that operates at distortion level D (n 2 1). For each positive integer n, let F, be the set of all nonnegative measurable functions f on A" such that
E [ f( X " ) 1
- l), is a useful result in information theory that has been previously pointed out and exploited by other authors, e.g., Tulcea [12] and Algoet and Cover [l].) ACKNOWLEDGMENT Thanks to Andrew Barron for helpful comments that were incorporated into the proof of Theorem 1.
as.,
E A".
h(Ulb2)
c u,(s)log
[Ul(S)/U2(S)I9
scs
.(y)=JA"PI(Y)dPn(X)I
Y EB-
Then
Proofi By [ l l , p. 291, we can find for each n i l a countable subset S,, of A" such that, letting XEA"
x
(27)
(23)
whenever B,, is a block code of order n, n 2 1.
Y E s,
Y)P,( Y),
(Note: if u1,u2are probability measures on a finite set S,~ ( U ~ Jdenotes U ~ ) the relative entropy of u1 with respect to u2,defined by
n +w
u , ( x ) A inf pn(x,y),
c
Pn(X, YGB
where, in the preceding sum, we adopt the conventions that Olog[O/Q]=O, Q z O , and Qlog[Q/O]= + m , Q.0.) Let a be the probability measure on B given by
A Lemma AI: Suppose that 1) and 2) hold. Then APPENDIX
lim p( B,,) 2 D ( m ) ,
log C ( x ) = h( P,IP) +
( 24)
j,-/(P,IP)
&,(x)
2
p(P,l4d P " ( X )
(28)
because the left side of (28) minus the right side of (28) is
268
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 37, NO. 2, MARCH 1991
equal to the nonnegative quantity h(culp). Let ( U , V ) be an A ” x B valued random pair such that U has the same distribution as X”,and the family of measures {p,: x E A”) constitutes the conditional distribution of V given U. It is easily checked that
=/$P,la)
I(U;V)
It is easily checked that
(29)
&Ax).
A
on S, given by
Since f ( X “ )2 log C ( X ” ) , taking the expected value of both sides yields (10) if we apply (271429).
Hence (33)
APPENDIXB Noiseless source coding theory [3, Chapter 31 gives us the following result. Lemma Bl a) Let S be a countable subset of dn, let 4: A“ + S be a measurable function, and let q be the probability distribution of 4 ( X n )on S. Then there exists a one-to-one function $ from S onto a set of finite binary words satisfying the prefix condition, such that the nth-order variable-rate code C = (4, $, S) satisfies E[r(C)] I n - w - log q(C#4X“))]+ n-1. b) Let C = (4, $, S) be any nth-order variable-rate code. Then E[r(C)] 2 n-’E[ - log q ( 4 ( X n ) ) ]where , q is the probability distribution of 4 ( X ” ) on S. Lemma B2: Let C,,C2 be nth-order variable-rate codes that operate at distortion level D. Then there exists an nth-order variable-rate code C that operates at level D and satisfies
r ( C ) Imin [ r ( C,), r ( c,)]
+ n- ’.
(30)
Sketch of Proo$ Let Ci= (4i, $i, Si),i = 1,2. I.,& S = S,U S,. Let $ be a one-to-one function from SIU S, onto a set of finite binary words satisfying the prefix condition such that L[$(Y)IIL[+~(Y)]+~,
Y €SI,
i=1,2.
Let 4: A” + S be the function
=
4,(
x),
otherwise.
Then C is the variable-rate code C = (4,$, S). Lemma B3: Let f be a nonnegative measurable function satisfying (14a) and (14b). Then for some A” X 2 valued random pair (U, V )satisfying (2) in which U has the same distribution as X”. Proof: For each X E A , , let S, be the subset of S defined by S, 2 { y E S: pn(x, y ) I D).From (14a), we have for pn-almost all x E A”. q ( S,) > 0 (32) For each x satisfying (321, let qx be the probability measure
Let (U, V )be the A” X S valued random pair in which U has the same distribution as X ” and the set of measures (4,) comprise the conditional distribution of V given U. Then, (U,V) satisfies (2). Letting [Y be the distribution of V , we have
I ( U ; V ) =jA)(q,lq)
dCL,(x).
(34)
But
because the left side of (35) minus the right side of (35) is the nonnegative quantity h(alq). Combining (331435) we obtain (31).
REFERENCES [l] P. H. Algoet and T. M. Cover, “A sandwich proof of the Shannon-McMillan-Breiman theorem,” Ann. Prob., vol. 16, pp. 899-909, 1988. [2] A. R. Barron, “Logically smooth density estimation,” Ph.D. thesis, Stanford University, Stanford, CA, 1985. [3] R. E. Blahbt, Principles and Practice of Information Theory. Reading, MA: Addison-Wesley, 1987. [4] R. M. Gray, Entropy and Information Theory. New York Springer-Verlag, 1990. [5] J. C. Kieffer, “Block coding for an ergodic source relative to a zero-one valued fidelity criterion,” IEEE Trans. Inform. Theory, vol. IT-24, no. 4, pp. 432-438, July 1978. “A unified approach to weak universal source coding,” [6] -, IEEE Trans. Inform. Theory, vol. IT-24, no. 6, pp. 674-682, Nov. 1978. [7] -, “An almost sure convergence theorem for sequences of random variables selected from log-convex sets,” in Proc. Conf.
Almost Everywhere Convergence in Probab. Ergodic Theory 11, Northwestern Univ., Evanston, IL, 1989. [8] J. Kingm:?, “The ergodic theory of subadditive stochastic processes, J . Roy. Stat. Soc., ser. B, vol. 30, pp. 499-510, 1968. [9] K. M. Mackenthun and M. B. Pursley, “Variable-rate, strongly and weakly universal source coding,” in Proc. 1977 Conf. Inform. Sci. Syst., John Hopkins Univ., Mar. 1977, pp. 286-291. [lo] D. S . Ornstein and P. C. Shields, “Universal almost sure data compression,” Ann. Prob., vol. 18, pp. 441-452, 1990. [ l l ] H. G. Tucker, A Graduate Course in Probability. New York Academic, 1967. [12] A. I. Tulcea, “Contributions to information theory for abstract alphabets,” Arkiv for Matematik, vol. 4, pp. 235-247, 1960.