outline future research trends in the area. Index TermsSource coding, data compression, rate distortion theory, distortion measure, fidelity criterion, quantization.
IEEE TRANSACTIONS ON INFORMATION THEORY,VOL. 39, NO. 5, SEPTEMBER 1993
1473
A Survey of the Theory of Source Coding with a Fidelity Criterion John C. Kieffer Fellow, IEEE
Invited Paper
Abstract-The purpose of this paper is threefold 1) to acquaint the reader with the types of problems that have been considered in the area of some coding with a fidelity criterion; 2) to survey results that have been obtained on these problems; and 3) to outline future research trends in the area.
X
!2znl
A
X
Fig. 1. One-shot coding configuration.
Index TermsSource coding, data compression, rate distortion theory, distortion measure, fidelity criterion, quantization. can be achieved or nearly achieved. One-shot problems are inherently difficult-ne typically is not able to determine a OURCE coding with a fidelity critefion (also called rate closed form expression for the minimum rate at which a finite distortion theory) is a field of information theory that block can be encoded subject to a distortion constraint, and originated with Claude Shannon about 35 years ago [114]. In also the determination of a code that achieves this minimum this paper, our purpose is to acquaint the reader with the types rate may require an inordinate amount of time. Asymptotic of problems that have been considered in this field, to point source coding theory is concerned with the determination of out some of the important results that have been obtained, and the minimum encoding rate in the compression of a "typical" to suggest lines of future research. As the title of this paper sequence of infinitely many source samples; such a rate is sometimes computable because one may be able to obtain indicates, we shall focus upon theory rather than practice-the reader interested in applications can profitably consult the upper and lower bounds on the minimum encoding rate for references [72], [l] for a treatment of practical implementation the first n samples that become tight asymptotically as n + issues not discussed in the present paper. Also, we shall bring 00. The theory of source coding with a fidelity criterion is the reader to the frontiers of source coding theory with few predominantly an asymptotic source coding theory. The paper is organized as follows: in Section I, we formally preliminaries-readers in need of a thorough introduction to introduce the reader to one-shot coding theory; Section I1 is source coding theory can consult one of the texts [17], [64] devoted to quantization problems, which are a class of probbefore reading this paper. In the theory of source coding with a fidelity criterion, one lems studied in conjunction with one-shot coding theory; the is interested in determining the minimum encoding rate in concept of an abstract source model is introduced in Section code bits per data sample with which data generated by a 111; types of problems that arise in asymptotic source coding source of certain type can be compressed via a code from theory are outlined in Section IV, Section V addresses coding a class of codes, subject to a constraint on the distortion in theorems for sequential source models and multiparameter reconstruction of the data from its encoded form. Knowing source models; universal source coding is treated in Section the minimum encoding rate that is achievable can be of VI and multiterminal source coding is treated in Section VII; interest to source coding practitioners as well as source coding and in Section VIII, research questions of current and future theoreticians-when one designs a particular code for a data interest are discussed.
S
compression task, one would like to compare the rate at which it performs to the rate at which the best codes perform. In our development, we dihtinguish between two types of source coding theory, viz., one-shot coding theory and asymptotic coding theory. In one-shot coding theory, one tries to determine the minimum encoding rate in the compression of a block of finiteZy many source samples, and also tries to determine a code via which the minimum encoding rate
I. ONE-SHOT CODINGPROBLEMS
Fig. 1 illustrates the configuration employed in one-shot coding. The configuration consists of an encoder and a decoder. A random object X is coded by the encoder into a random binary codeword; this codeword, after having been kept in a storage device or after having been transmitted through a communication medium, is, made available to a Manuscript received January 31, 1992; revised January 29, 1993. T h i s decoder who builds a reconstruction X of X. We make the work was supported by the National Science Foundation under Grant NCR- standard assumption of source coding theory that the codeword 9003106. is not altered by noise occurring between the encoder output The author is with the Department of Electrical Engineering, University of terminal and the decoder input terminal. (We discuss the Minnesota, Minneapolis, MN 55455. IEEE Log Number 9210709. situation in which noise is allowed at the end of the paper.) I
0018-9448/93$03.00 0 1993 IEEE
1474
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 5, SEPTEMBER 1993
The form of the decoder in the one-shot coding configuration is completely determined once one has specified the code that the encoder is to use to code X into a binary codeword-the decoder simply decodes its input into ;he reproduction X for which the expected distortion E [ p ( X ,X ) ] (as measured according to a given distortion measure p ( . , is minimized. We now formalize the notions that we have introduced. Let (0, 1}* denote the set of all strings of finite length that one can form from the binary symbols 0, 1. Let R be a measurable space. A code on R is a mapping 4 from R into the set (0, 1}* such that no member of 4(R) is a prefix of any other member of +(Cl). The set 4(R) is called the set of codewords for the code 4. A code 4 is sometimes referred to as a variable-length code to emphasize the possibility that the codewords may be of different lengths. A code on R is said to be a fixed-length code if every codeword has the same length. If R is finite, and 4 is one-to-one, we call 4 a noiseless code. We fix a distortion measure p on R. (A distortion measure on R is any measurable function from R x R into the nonnegative reals.) Let X be a random object taking its values in R and let 4 be a code on R. We can use the code 4 to encode X into the random codeword 4 ( X ) . There are two parameters r ( X , 4 ) and p ( X , 4 ) that are used to evaluate the encoding performance of the code 4. The parameter r ( X , 4 ) is the code rate and is defined to be the expected codeword length: a))
One obtains different cases of the distortion optimization problem by considering different choices for the family of codes C. We shall denote by DvL(R) the function Dc(R) which arises when C, is taken to be the set of all variablelength codes on R and we shall denote by DFL(R) the function Dc(R) which arises when C is taken to be the set of all fixed-length codes on 0. The problem of determining the function D F L ( R )is related to a problem called the N level quantization problem. The problem of determining the function DvL(R) is related to a problem called the entropyconstrained quantization problem. Quantization problems are considered in Section 11.
B. Rate Optimization in One-Shot Coding
Fix X, R, and p as in the preceding subsection. Let C be a specified family of codes on R. Fix D > 0. Define &(D) to be the infimum of r ( X , 4 ) as 4 ranges through all codes in C satisfying the distortion constraint p ( X , 4) 5 D . In the problem of distortion optimization in one-shot coding, one tries to determine & ( D ) as a function of D , and to determine for each fixed D a code for which &(D) is attained or nearly attained. Rate optimization in one-shot coding is related to the distortion-constrained quantization problem that is considered in Section 11. An important special case of the rate optimization problem is the problem of rate optimization in noiseless one-shot coding. Here, one takes R to be finite, and takes C to be r ( X , 4 ) 2 E[length of & ( X ) ] . the family of all noiseless codes on R. One then wishes to The parameter p ( X , 4 ) is the code distortion and is defined by determine the minimum Rmin of T ( X ,4 ) over all noiseless codes 4 on R, as well as to determine a noiseless code which achieves Rmin.This problem was addressed in the early days of information theory: a noiseless code of minimum possible where the infimum is taken over all mappings $: 4(R) 4 R. rate can be found by the Huffman algorithm 1701; lower and The interpretation of these two code parameters is clear from Fig. 1: the parameter T ( X ,4 ) is the expected number of upper bounds for Rminare H ( X ) and H ( X )+ 1 , respectively, code bits generated at the encoder output in response to where H ( X ) denotes the entropy of the random variable X , measured in bits [113]. the input X , and the parameter p ( X , 4 ) is the expected distortion E [ p ( X ,X ) ] that results when the decoder applies 11. QUANTIZATION PROBLEMS to the random codeword + ( X )a mapping 4 that achieves the infimum on the right side of (1.1) (assuming there is such In this section, we consider quantization problems, including a mapping). In one-shot coding theory, one is given X , p the N-level quantization problem, the entropy-constrained and then one tries to choose a code 4 so as to minimize quantization problem, and the distortion-constrained quantir ( X , 4 ) subject to a constraint on p ( X , 4 ) , or else one tries to zation problem. As we shall see, quantization problems shed choose a code 4 to minimize p ( X , 4 ) subject to a constraint on light upon the one-shot coding problems. T ( X, 4 ) .These problems, which we shall call one-shot coding We formally define the concept of quantizer. Throughout problems, are formally described in the two subsections which this section, we fix 52 to be a finite-dimensional Euclidean follow. space. A quantizer is defined to be a finite set of pairs {(Pi,q i ) } in which {Pi}is a measurable partition of 52 and A. Distortion Optimization in One-Shot Coding { q i } is a set of points in R. A quantizer is called a scalar Fix a random object X taking its values in some space R, quantizer if the dimension of R is equal to one, and is called along with a distortion measure p on R. Let C be a specified a vector quantizer, otherwise. In this section, we fix X to be a random vector taking its family of codes on R. Fix R > 0. Define Dc(R) to be the infimum of p ( X , 4 ) as 4 ranges through all codes in C values in R whose distribution is absolutely continuous with satisfying the rate constraint r ( X , 4 ) 5 R . In the problem respect to Lebesgue measure and which satisfies E[11X112]< of distortion optimization in one-shot coding, one tries to 00, where (1 . 11 denotes the Euclidean norm. We also fix determine Dc(R) as a function of R . Going beyond this, one p to be squared error distribution on R (i.e., p ( z , y) = tries to find, for fixed R , a code for which DC( R ) is attained ((z-y(I2, z, y E 0).A quantizer {(Pi,q i ] } is used to quantize X into the random vector X in which X = q; if and only if or nearly attained.
KIEFFER A SURVEY OF THE THEORY OF SOURCE CODING WITH A FIDELITY CRITERION
1475
X E Pi,Vi. The distortion that arises from using the quantizer in optimal scalar quantization have not carried over to optimal ((Pi, q i ) } to quantize X is defined to be the number vector quantization. No general condition on the distribution of X is known for which there will exist a unique N -
where Px is the probability distribution of X. Letting X be the random vector into which X is quantized by the quantizerj(Pi, q i ) } , the distortion (2.1) is simply the quantity E [ p ( X ,X)] and is thus a measure of how well X serves to reconstruct X.
A. N-Level Quantization A quantizer {(Pi, q 2 ) : i = 1, . ,N} is called an N-level quantizer. In the N-level quantization problem one seeks an N-level quantizer {(Pi,q i ) : i = 1, . .,N} whose distortion (2.1) is minimized. Such an N-level quantizer is called an optimal N-level quantizer. We let D1 (N) denote the minimum distortion among all N-level quantizers. We consider for a moment the special case in which N = 2R where R is an integer. Let {wi:i = 1, , N} be an enumeration of the binary words of length R. An N-level quantizer { ( R ,q i ) } induces the fixed-length code I$ on R in , . The code which d ( w ) = wi for each w E Pi, i = 1, 2 , . + N 4 satisfies T ( X ,4 ) = R and its distortion p ( X , 4) is equal to the distortion (2.1) of { ( P z ,q i ) } . These facts allow us to conclude that DFL(R)5 D I ( ~The ~ )reverse , inequality can also be deduced, and thus we have the following relationship which relates the problem of distortion optimization in oneshot coding with fixed-length codes to the N-level quantization problem:
D F L ( R )= D I ( ~ R ~= ) ,1, 2 , . . . .
(2.2)
We indicate some of the results that are known concerning the N-level quantization problem for an arbitrary N, which the reader may reinterpret in terms of the problem of distortion optimization in one-shot coding with fixed-length codes. We first consider results on scalar quantization (dim Q = 1). Assuming that the probability density of X is log-concave, Fleischer [49] showed that there is a unique optimal N-level scalar quantizer. One method for finding the optimal N-level scalar quantizer is Lloyd’s Method I [91]. In this method, one starts with an initial N-level quantizer and then computes a sequence of N-level quantizers by means of successive iterations of an algorithm; the resulting sequence of quantizers converges in mean square to the optimal N-level quantizer. Another method for finding the optimal N-level quantizer is the Lloyd-Max method [91], [97]. In general, there is no simple expression for the optimal N-level scalar quantizer; however, Cambanis and Gerr [27] have exhibited an N-level scalar quantizer which is easily determined and which is approximately optimal for large N . Fleischer’s result on the uniqueness of the optimal N-level scalar quantizer and the validity of Lloyd’s Method I can be extended to distortion measures other than squared-error distortion [79], [117]. We now discuss results concerning vector quantization (dim R > 1). Unfortunately, the successes that have been obtained
level optimal vector quantizer. Also, the determination of an optimal or nearly optimal N-level vector quantizer can be a difficult computational problem. An iterative algorithm called the generalized Lloyd algorithm has been proposed for vector quantizer design by Linde et al. [90]; the generalized Lloyd algorithm generalizes upon Lloyd’s method I for scalar quantization. Starting with an arbitrary N-level vector quantizer, each successive iteration of the generalized Lloyd algorithm computes an N-level vector quantizer which yields distortion (2.1) less than or equal to the distortion at the previous iteration. However, Gray and Karnin [62] have given an example illustrating that the generalized Lloyd algorithm may produce a sequence of N-level vector quantizers that converges to an N-level vector quantizer that is locally optimal but not optimal. One can modify the generalized Lloyd algorithm to prevent one from getting “stuck” at a quantizer which is not optimal; to do this, one applies a random perturbation after each iteration of the generalized Lloyd algorithm to obtain a quantizer that is used at the start of the next iteration. If the perturbations are made in the right way, such a modified algorithm can yield a considerably better vector‘ quantizer than the generalized Lloyd algorithm after the same number of iterations, and one can even obtain convergence to an optimal N-level vector quantizer in some caes [118], [28], [130], [131]. (In the papers [118], [28], the degree to which a quantizer is perturbed on each iteration is controlled by a simulated-annealing type cooling schedule, whereas a more genera1perturbation structure is allowed in [130], [131].) Rose et al. [112] have proposed a deterministic annealing approach to optimal N-level vector quantizer design. Neural network approaches to vector quantizer design have also been studied 121. In the preceding paragraph, we have not placed any structural constraints on the N-level vector quantizer that is sought. In. practice, one would require a vector quantizer that is not overly complex, and therefore one would narrow the search for a good vector quantizer to those having some specified simple structure. Among the most useful of types of quantizers with structural constraints are tree-structured quantizers [90], [125], trellis coded quantizers [95], and lattice vector quantizers (discussed at the end of this section). There is a considerable literature on the design of vector quantizers with structural constraints which we cannot hope to survey here; the text [55] provides a good overview of this topic. We now report results on the asymptotic performance of optimal N-level quantizers as N + CO. Let k be the dimension of R. Then, subject to mild regularity conditions on the C I N - ~ Ias~ distribution of X, it is known that D 1 ( N ) N + CO, where C1 is a positive constant that depends on k and the distribution of X. This result for k = 1 was obtained by Panter and Dite [lo71 and for k 2 2 was obtained by Zador [128]. It is interesting to see what the asymptotic performance of optimal N-level quantizers as N + CO tells us about the problem of distortion optimization in one-shot coding with N
1476
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 5, SEPTEMBER 1993
-
fixed-length codes. With R, X as in the preceding paragraph, we see by (2.2) that D F L ( R ) C12-2R/kas R -+ 00.
B. Entropy-Constrained Quantization Fix H > 0. In the entropy-constrained quantization problem, one seeks a quantizer of minimal distortion among all quantizers { (P,, q i ) } whose entropy - Pr
[XE P;]log2P r [ X E pi]
(2.3)
At this point, it is instructive to see what these asymptotic results on entropy-constrained quantizers tell us about the problem of distortion optimization in one-shot coding with variable-length codes. With a bit of work, one can show that (under the assumptions of the previous paragraph) DVL(R) has the same asymptotic behavior as R -+ 00 as does D z ( R ) . Hence, DvL(R) C22-2R/kas R + 00.
-
C. Distortion-Constrained Quantization
i
We conclude this section with a brief discussion of a is constrained to be no greater than H. Such a quantizer quantization problem that is a dual problem to the entropy{(Pi, q i ) } is called an optimal entropy-construined quantizer constrained quantization problem. This is the distortion(at the entropy level H ) . Let D2(H) denote the distortion (2.1) constrained quantization problem in which one seeks a of an optimal entropy-constrained quantizer at the entropy quantizer of minimal entropy (2.3) among all quantizers level H. which are constrained to have distortion (2.1) no greater than A quantizer {(P,, q,)} induces a variable-length code 4 on a fixed distortion level. In recent years, study has focused R as follows. Let {wi} be a set of binary words such that upon the entropy performance of distortion-constrained lattice no word in this set is a prefix of any other word in the set, quantizers. A lattice quantizer is a quantizer {(Pi,q i ) } in and the length of w, is equal to the smallest positive integer which the q; are chosen from some Euclidean space lattice. greater than or equal to -log, Pr [ X E P,] for each i for which Lattice quantizers are desirable because of efficient algorithms Pr[X E Pi] > 0. Define 4 to be the variable-length code such for their implementation [31], [32]. One way in which to that for each i, $ ( w ) = wi for every w E P,. The induced code obtain a lattice vector quantizer is to quantize each coordinate 4 satisfies the rate constraint r ( X , 4) 5 H + 1 , where H is the of X with a scalar uniform quantizer. Ziv [137] has obtained a entropy (2.3) of the quantizer { ( P z ,q z ) } .Also, p ( X , 4) is less bound showing how close the minimum entropy for distortionthan or equal to the distortion of the quantizer {(Pi,q,)}. From constrained lattice vector quantizers of this type can be to the these facts, we deduce that DvL(R) 5 D2(R - l), R > 1. It minimum entropy among all distortion-constrained vector is also easy to deduce that 0 2 ( R ) 5 DVL( R ) ,R > 0. Hence, quantizers; the bound is a uniform bound valid no matter we have the following inequality which allows us to relate the what the distribution of the random vector to be quantized. entropy-constrained quantization problem to the problem of For extensions of Ziv's result, see [66], [129]. distortion optimization in one-shot coding with variable-length In the discussion preceding (2.4), one sees how a quantizer codes: gives rise to a variable-length code; this allows one to reformuD2(R) 5 DvL(R) 5 D2(R - l), R > 1. (2.4) late results in the distortion-constrained optimization problem as results concerning rate optimization in one-shot coding. The entropy-constrained quantization problem has been attacked with some success in the case of scalar quantization. 111. T H E CONCEPT OF AFMXACT SOURCE Assuming X has a log-concave density, Berger [18] obtained The purpose of the present section is to lay out a general a necessary condition for a scalar quantizer to be an optimal entropy-constrained scalar quantizer. Farvardin and Modestino concept of information source. In order to accomplish this, we [47] used this necessary condition to find an iterative algorithm need to distinguish between an information source that one which can be successful in finding optimal entropy-constrained encounters in the real world (which we shall refer to as a physscalar quantizers. Kieffer et al. [81] developed another iterative ical source) and the mathematical model we use to describe algorithm, based on Lloyd's Method I, for finding optimal a physical source (which shall be called an abstract source). An abstract source which models a physical source is taken to entropy-constrained scalar quantizers. Not much is known about how to find optimal entropy be a random object whose set of possible values includes the constrained vector quantizers. Chou et al. [30] have proposed set of outputs that could be generated by the physical source. an iterative algorithm to find such quantizers, but use of this In the theory of source coding with a fidelity criterion, one is algorithm can result in entropy-constrained vector quantizers concerned with code design for abstract sources. One should keep in mind, however, that the ultimate test of a code is its which are locally optimal but not optimal. There is an interesting theory on the performance of optimal performance in coding the output of the underlying physical entropy-constrained quantizers as the entropy level H -+ 00. source; if this performance is unsatisfactory, then one should Let k be the dimension of the space R. Subject to some mild seek a different abstract source model. Definition: We define a source as a collection [ X , { X " : n = regularity conditions on the distribution of X, is it known that 1, 2, in which: D2(H) C22-2H/kas H 00, where C2 is a constant that a.1) X is some random object; depends on k and the distribution of X. This result was shown a.2) for each n, X" is a random object which is a function for k = 1 by Gish and Pierce [56] and for k > 1 by Zador of [128]. (The informative article by Gersho [54] gives further The random object X is called the source output. The information on the asymptotic behavior of D1 (N) for large N and & ( H ) for large H.) random objects { X " : n = 1, a , . . . } are called the source
-
e}]
--$
x.
~
1477
KEFFER A SURVEY OF THE THEORY OF SOURCE CODING WlTH A FIDELITY CRITERION
representations. Qpically, the source output X is an infinite- model [ X , { X " : n = 1, 2,...}] in which the source output dimensional random object, so that if one were to code X into X is a 2-D random signal ( X ( u 1 ,U,): ( U I , 212) E S * ) with a string of bits from which X could be perfectly reconstructed, continuous autocorrelation function, and in which each source infinitely many bits would be required. To get around this representation X" is the n x n matrix whose (i, j)-th element , 1) Si,j is the square in row i and column j difficulty, the representations { X " } are chosen so that X" for is X [ S i , j ] where large n will provide a sufficiently close approximation to X of the partitioning of S* into n-' x n-l sub-squares, and 2) from the point of view of the intended receiver. One then only for each subsquare S of S* , X [ S ]denotes the random variable needs to code X " for a sufficiently large n; assuming X " is A &x(ul,U Z ) dul d212 finite-dimensional, this can be done using a string of bits of X [ S ]= area( S) finite length. Coding theorems of source coding theory tell us how the length of this string of bits must grow with n. An example of a source coding problem for this type of model shall be given in Section IV. A. Examples of Abstract Sources Tree-Structured Sources: Let [ X , { X " : n = 1, 2, . . be an abstract source in which X is a random labeled rooted tree Sequential Sources: Let X = ( X I ,X 2 , X s , . .) be a ranwith no terminal nodes, and each X" is the subtree of X which dom sequence in which all the Xi's take their values in a common set A. Consider a source whose output is X and is rooted at the root node of X and contains all the branches of X up through depth n. Such a source (which we shall call whose representations { X " : n = 1, 2 , . . .} are defined by a tree-structured source) is useful for modeling certain kinds A of data (such as fractal data). We shall discuss tree-structured n = 1, X" = ( X 1 , . - - , X n ) , sources further in Section IV. A source of this type shall be called a sequential source (with source alphabet A). The vast majority of source coding Iv. PROBLEMS IN SOURCE CODING theorems apply to sequential sources. A sequential source is WITH A FIDELITY CRITERION stationary (resp. ergodic) if the source output is a stationary In source coding for a given source with representations (resp. ergodic) random sequence. Multiparameter Sources: Let s be a positive integer and let { X " } , one wishes to examine how efficiently one can code IN' be the set of all vectors (721, . ,n,) whose components the representation X" in the limit as n grows large. This results are non-negative integers. Let X = ( X u : U E IN') be a in a source coding theory, called source coding with a fidelity random field in which all the Xu's take their values in a criterion, which is asymptotic in nature. In the present section, common set A. Consider a source with output X and with we detail the types of problems th$ one deals with in source coding with a fidelity criterion. For the rest of the paper, we defined by representations { X " : n = 1, 2, let I denote the set of positive integers. A fidelity criterion for a source [ X , { X " : n E I } ] is a X" lul 5 n - 11, n = 1,2,..-, sequence {p": n E I} in which each p" is a distortion where the notation 11' 11 refers to the maximum component of the measure on the measurable space in which the representation vector U . A source of this type shall be called a multiparameter X" takes its values. A fidelity criterion { p " } for a source with source (with source alphabet A). A multiparameter source is representations { X " } allows one to compute for each n the stationary (resp. ergodic) if the source output is a stationary distortion p"(X", 4") that arises from the coding of X" with (resp. ergodic) random field. a code 4". Sampled-Duta Sources: Let X be a continuous-time random signal ( X ( t ) :0 t T). Consider a source with output A. Fidelity Criteria for Sequential Sources X and with representations { X " : n = 1, 2, .} defined by We illustrate some types of fidelity criteria for the most common type of source model, the sequential source model. x " & ( x ( i ~ /i =n1),:2 , . - . , n ) , n = l , 2 , - . .. In the following, we consider a sequential source with source denote the source output. One can conceive each representation X" to be the discrete- alphabet A. Let X = ( X I ,X 2 , time signal that arises from passing X through an ideal sampler The source representations are { X " : n E I } , where, for each in which the sampling interval is of length T / n .Therefore, we n, X" = ( X l , . - - , X n ) . Probability of Error Fidelity Criterion: Let A be a finite call such a source a sampled-data source. An Image Source Model: An image source is a physical set. The probability of error fidelity criterion is the fidelity source which generates an image as output. An abstract source criterion { p " } in which, for each n, pn is the discrete metric used to model an image source is called an image source on A" (i.e., p " ( ~ 1 ,z2) = 0 or 1 depending upon whether model. Various types of image source models have been 2 1 = 2 2 or z1 # z2). This fidelity criterion gets its name proposed, depending upon the nature of the underlying image because with this cKerion, if 4 is a code on A", then there source; we introduce the reader to one type of model here. is a reconstruction X" of X" basedLn the codeword $ ( X " ) Suppose the output generated by the underlying image source such that p"(Xn, 4) = Pr[X" # X n ] . More generally, the can be any member of a set of 2-D signals defined on the unit probability of error fidelity criterion can be defined for any square S* = ( ( ~ 1212): , ~ 1 uzreal, , 0 UI 1, 0 5 u2 i source in which the source representations all take their values 1 ) . Then, one might want to make use of an image source in finite sets.
s
e}]
2 , e . a .
- e}
e (xu:
<
/fd(n)}. The choice of rate and distortion scales that one should make will be clear in any given source coding application-the examples given in this section illustrate this point. The problems of source coding with a fidelity criterion are of two types. In one type of problem, one minimizes the speed of growth of code distortion that results from the coding of the source representation subject to a constraint on the speed of growth of code rate. In the other type of problem, one minimizes the speed of growth of code rate that results from the coding U€ the source representation subject to a constraint on the speed of growth of code distortion. In the ensuing treatment, we have referred to the first type of problem as "distortion optimization in source coding with a fidelity criterion" and referred to the second type of problem as "rate optimization in source coding with a fidelity criterion." (These two types of problems are the asymptotic source coding problems that are suggested by the one-shot coding problems of Section I.)
B. Distortion Optimization in Source Coding with a Fidelity Criterion We are given a source [ X , { X " : n E I } ] ,a fidelity criterion {p": n E I},a rate scale f,., and a distortion scale fd. We wish to encode the source representations {X"} via a sequence of codes {I$"} while we keep the ratios { r ( X n ,qjn)/f,(n)} fixed, examining how small the ratios {p"(X", +")/fd(n)}
can become as n becomes large. In order that we may do this, we need to specify what code sequences (4") shall be allowed. Accordingly, we fix a codingprescription P for the source [ X , { X " : n E I } ] , which is a family of sequences {4": n E I}in which, for each n, 4" is a code on the space in which X " takes its values. Let R > 0. We define D p ( R ) to be the infimum of all numbers D such that there exists {$": n E I}in P for which the following are true: b.1)
r ( X n , 4") 5 f,(n)R,
for all n E I.
b.2)
Roughly speaking, the number D p ( R ) tells us that for large n, if we compress X " using no more than f,.(n)R bits on the average, then the minimum code distortion is about Dp(R)fd(n).The first task one would like to accomplish in distortion optimization is to determine D.p(R) as a function of R. Secondly, having determined D p ( R ) , one would like to exhibit, for each R > 0, a particular sequence of codes {$": n E I} E P such that b.1) holds and such that the left side of b.2) is equal to or close to D p ( R ) . Example-Distortion Optimization for Sequential Sources: We consider a sequential source. Let X = ( X I ,X z , . . .) denote the source output. The source representations are { X " : n E I } , where, for each n, X" = ( X 1 , . . . , X n ) . We fix a single-letter fidelity criterion {p": n E I} for the given source. Let P be the coding prescription consisting of all sequences {#P:n E I} in which, for each n, 4, is a variable-length code on A". As is customary for a sequential source, we define the distortion scale fd and the rate scale f,. by fd(n) f r ( n ) n. Assuming 0 < D p ( R ) < CO, then the number Dp ( R ) has the following interpretation. c.1) Given E > 0, there exists {+": n E I} E P such that r ( X n , 4") 5 Rn for all n, and p"(X", 4") 5 ( D p ( R ) ~ ) for n n sufficiently large. c.2) Given E > 0, then for every {4": n E I} E P for which r ( X n , 4") 5 Rn for all n, the inequality p"(Xn, 4") 2 (Dp(R)-e)n holds for infinitely many
+
n.
Example4istortion Optimization for Sampled-Data Sources: We consider the sampled-data source [ X , { X " : n E I } ] in which X is a random signal ( X ( t ) :0 5 t 5 T) and for each positive integer n, X" is the sampled signal ( X ( i T / n ) :i = 1, 2,. ,n). We take as our fidelity criterion {p": n E I}the one in which each p" is defined according to (4.1), where p is squared-error distortion on the real line R. We shall specify a coding prescription that employs fixedlength block codes. In order that we may do this, we need to define this concept. Let S be a set, and let IC, j be positive integers. We call a code qY on Si a fiued-length kth order block code if it is a fixed-length code and if, letting r be the remainder upon division of j by IC, one of the following is true.
1479
K E m R A SURVEY OF THE THEORY OF SOURCE CODING WlTH A FIDELITY CRITERION
d.1) r = 0 and there is a fixed-length code q5k on Sk such that one encodes each sequence z in S3 with @ by 1) partitioning z into subblocks of length k; 2) encoding each subblock with dk;3) forming the codeword CJY ( 5 ) by putting together the codewords for the subblocks in the same order as the subblocks appear in z. d.2) T # 0 and there are fixed-length codes 4k, @ on Sk,S', respectively, such that one encodes a sequence z in Sj with CJY using the three-step procedure in d.l), with the proviso that exactly one of the subblocks (the rightmost subblock) in the partitioning of z is of length r rather than k and is therefore encoded using qY. We take our coding prescription P to consist of all s = {+": n E I } for which there is a positive integer k = k ( s ) such that each 4" is a fixed-length kth order block code on A". Fix R > 0. Suppose we wish to code each X" at a rate of no more than R bits/sample. Since each representation X" consists of n samples, this means that we impose the rate constraint r ( X n , 4") RnVn. For many types of random signals ( X ( t ) :0 5 t T) (such as fractional Brownian motion; see [46, ch. 16]), there will exist a positive constant ,6 and positive constants AR and BR such that: e.1) For some {+": n E I } E P satisfying the rate conRnVn, we have p"(X", 4") straint r ( X n , +")
each fixed D a particular sequence of codes {I$": n E I } E P such that f.1) holds and such that the left side of f.2) is equal to or close to R p ( D ) . Example-Rate Optimization for an Image Source Model: We consider as our source the abstract source introduced in Section I11 that can be used to model an image source. The source output X is a 2-D random signal (X(u1,u2): (u1, 112) E S * ) with continuous autocorrelation function. The source representation X" for each n E I is the n x n random matrix with ( 2 , j)th element equal to X [ S z , j ]where , Si,j is the square in row i and column j of the partitioning of the unit square S' into n-' x n-l subsquares. We employ the fidelity criterion {p": n E I } in which, letting z = ( x i , j ) ,y = (yij) be any two n x n real matrices, we have P"(Z, y)
2
2
-Yij)
.
i,j
We shall specify a coding prescription that encodes using variable-length block codes. A variable-length kth order block code on a product space S3 is defined in a manner analogous to the way in which the concept of a fixed-length kth-order block code was defined in d.1) and d.2). (One modifies d.1) and d.2) to allow the codes +'", qY to be variable-length codes.) Through horizontal scanning, one can regard each n x n real BRnOVn. matrix as a member of the product space S j in which S is the e.2) For every {4":n E I } E P satisfying the rate con- set of real numbers and j = n 2 .Thus, it should be clear what straint r ( X n , 4") RnVn, we have p"(Xn, 4") we mean by a variable-length kth order block code on the ARnPv n . space of n x n real matrices. We take the coding prescription If e.1) and e.2) hold, it is clear that one should define the rate P in this example to consist of all sequences s = {d": n E I } and distortion scales by f T ( n ) n, fd(n) = no. One can then for which there is a positive integer k = k ( s ) such that each seek to compute the optimum distortion D p ( R ) , which will 4" is a variable-length kth order block code on the space of satisfy AR 5 D p ( R ) BR. all n x n real matrices. We now have to specify the choice of rate scale fT and C. Rate Optimization in Source Coding distortion scale fd. We assume that the random signal X with a Fidelity Criterion exhibits the following type of scale invariance. There is a positive number ,6 such that if S1 is a 01 x 01 subsquare Fix a source [ X , {X":n E I } ] , a fidelity criterion {p": n E S* and S2 is a u2 x 02 subsquare of S*, then the of I } , a rate scale f,.: I + (0, CO) and a distortion scale fd: I -+ (0, CO). Specify a coding prescription P. We want to examine random variable X [ S l ]has the same distribution as the random how small the ratios { r ( X n ,q5")/fT(n)}can become for se- variable (01/02)pX[S2].Then, for each D > 0, it can be quences {#P}in P for which the ratios {p"(X", d")/fd(n)} shown that there exists R > 0 and {qP: n E I } E P such that remain fixed. g-1) Let D 2 0. We define R p ( D ) to be the infimum of all pn(xn,4") Dn2(l-p) V n ; numbers R such that there exists {c#P:n E I } in P for which the following are true. g-2) f.1)
<
in which each entry Xi takes its values in a finite set A. We are given the sequential source S = [X, {X": n E I } ] ,which means that X" = (XI, . .,Xn) for every n. The rate scale f,. is taken to be fp(n) z n. First, we take the coding prescription P to be the set of all sequences {$": n E I} in which 4" is a noiseless variable-length code on A" for every n. Then, it follows readily from results in Shannon's paper [113] that Rp is equal to the entropy rate H ( S ) of the source S, defined by
-
+
1481
This result fails if S is stationary but not ergodic; however, in this case Parthasarathy [lo81 obtained upper and lower bounds for R q E )which become tight as E 4 0.
B. Coding Theorems for Sequential Sources with Single-Letter Fidelity Criteria In this subsection, X will again denote a random sequence (XI, X2, . .), and we work with the sequential source [X, {X": n E I}]along with a fidelity criterion {p": n E I} that is always assumed to be single-letter unless stated otherwise. We define the rate scale f,. and the distortion scale fd by f,.(n) f d ( n ) E n. Let P V L be the coding prescription such that {4": n E I } E PVL, if and only if 4" is a variablelength code on A" for every n. Let P F L be the coding prescription such that {4": n E I} E PFL, if and only if 4" is a fixed-length code on A" for every n. Stationary Sources: As is standard in dealing with a stationary source, we assume a reference letter condition which means that E[pl(X1, a ) ] < 00 for some member a of the source alphabet. If the given source is memoryless (meaning that the Xi are independent and identically distributed), Shannon [114] showed that D ~ F L ( R =)D ~ v L ( R=) D ( R ) ,R > 0. Gallager [51] proved this formula for the case in which the source is stationary and ergodic with finite alphabet, and then Berger [16] removed the finite alphabet requirement. If the given source is stationary and nonergodic, Gray and Davisson [59] derived a formula which expresses the function D P F L ( R ) as a weighted average of the distortion-rate functions of the ergodic components of the source; it is clear from their formula that DPFL(R) # D( R) can occur in the stationary nonergodic case. On the other hand, Shields et al. [115] and Leon-Garcia et al. [89] showed that the formula D ~ v L ( R )= D ( R ) is valid if the given source is stationary and nonergodic; subsequently, Hashimoto [68] obtained a much simpler proof of this same result. Mackenthun and Pursley [93] proved the formula D ~ F(R) L = D p v (~R )= D(R) assuming the source to be stationary and ergodic and assuming the fidelity criterion to be subadditive; an accessible treatment of this result was published by Gray [65, Theorem 11.5.11. Nonstationary Sources: For a nonstationary sequential source, the reference letter condition given earlier is replaced with the assumption that sup;E[p'(X,, a)] < 00 for some a. Gray [65, Theorem 11.4.11 and Birmiwal [25] extended ) D ( R ) to the case in the formula D ~ F L ( R=) D ~ v L ( R= which the source is asymptotically mean stationary, extending an earlier result of Gray and Saadat [63]. Let P be the coding prescription in which s = {+": n E I} E P,if and only if there exists k = k ( s ) such that 4" is a fixed-length kth order block code for X" for every n. Assuming a finite source alphabet, Ziv [136] obtained a formula for D p ( R ) for the case in which the source output is an individual sequence, and Kieffer [80] obtained a formula for D p ( R ) valid if the source is nonstationary.
-
The dilliculty with this result in terms of practical application is that a sequence of codes (4") E P for which the limit superior of {r(Xn, 4")/n} is close to H ( S ) may be such that the codes 4" grow too complex as n 4 00. Therefore, we now take P to be the noiseless coding prescription in which s = {+": n E I } E P, if and only if there exists k = k ( s ) such that 4" is a noiseless variable-length kth order block code on A" for every n. Then RP 2 H ( S ) and it is possible for Rp > H ( S ) to hold. The equality Rp = H ( S ) holds if the source S is stationary (or, more generally, if it is asymptotically mean stationary [61]). Ziv and Lempel [135] obtained a formula for Rp for the case when the source output is an individual sequence (which means that for some particular infinite sequence z,Pr [X = z] = 1); moreover, on pages 535-536 of their paper they indicate how a sequence of codes {4"} may be selected so that the limit superior of {r(Xn,4 " ) / n } will be as close as desired to R p . Kieffer [82] extended the Ziv-Lempel result, obtaining a formula for Rp which is valid even if S is nonstationary. There are also results going back to the early days of information theory on the rate performance of fixed-length codes in noiseless coding of sequential sources. To obtain interesting results for fixed-length codes, one needs to relax the condition that the codes one deals with are noiseless. Accordingly, we say that a code 4" on A" is an enoiseless code for X" if there exists a function Y" of 4"(Xn) such that Pr[X" # Y"] 5 E . For any E (0 < E < l), let P ( E ) be the coding prescription in which {+": n E I} E P(e), if and only if 4" is an enoiseless code for X" for every n. Then, if the source S is stationary and ergodic, it is known that Rp(c) = H ( S ) . It is difficult to determine who first C. Coding Theoremsfor Multiparameter Sources proved this result; the earliest derivation of this result that In this subsection, we fix a positive integer s and an sthis author was able to find is in the paper by Khinchin [75]. parameter random field X = (Xu: U E Ma). We work with
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 5, SEPTEMBER 1993
1482
the multiparameter source [X, {X": n E I}]together with a single-letter fidelity criterion. Since X" = (Xu: )U)I n - 1) consists of ns samples, we define our rate scale f,. and our ns. Let the coding distortion scale fd by f,.(n) f d ( n ) prescriptions PVL, P F L be defined for the given source in a manner analogous to the definitions we made above for a sequential source. Then, if the source is stationary and finitealphabet, one can deduce the formula DPVL(R)= D(R) from Hashimoto's results [68]. If one assumes that the given source is stationary, ergodic, and finite alphabet, then one deduces the formula DPFL(R) = D(R) from the variablelength result just quoted together with the ergodic theorem for random fields [29]. One can extend these results to the case in which the source alphabet is infinite if one imposes additional assumptions to ensure the finiteness of the distortion-rate function D(R).
=
VI. UNIVERSAL SOURCE CODING Let R be a measurable space, and for each n E I let R, be a measurable space. Let J be an index set, and let {Sa:a E J} be ,a family of sources in which each S, is a source [X,, { X : : n E I}]such that the source output Xa takes its values in R and each representation X: takes its values in 52,.
coding of (Sa}always exists; it subsequently became known that the Lempel-Ziv codes [88], [134] yield a specific weakly universal sequence. Davisson also obtained a necessary and sufficient condition in order for there to exist a strongly universal sequence of codes for noiseless coding of {Sa}.
B. Universal Source Coding at a Fixed Distortion Level We suppose that the sources {Sa: a E J} have in common a fidelity criterion {p": n E I},a coding prescription P, a rate scale f,.: I -+ (0, m), and a distortion scale fd: I + (0, m). We do not require that the spaces (52,) be finite as we did in universal noiseless codingPNote that the function Rp ( 0 ) will vary from source to source within the family {Sa}. Accordingly, the function R p ( D ) for the source Sa shall be (0). denoted Definition I: A sequence of codes {4": n E I } E P is weakly universal for coding of {Sa}at the distortion level D 2 Oif h.1)
and h.2)
A. Noiseless Universal Source Coding
We suppose that each 52, is finite. We fix in common for Definition 2: A sequence of codes {4n:n E I} E P is all the sources in {Sa}a noiseless coding prescription P and strongly universal for coding of {Sa}at the distortion level a rate scale f,.: I + (0, 00). Note that the optimum noiseless coding rate Rp will vary from source to source in {Sa}. D >_ 0 if for any e > 0 there exists N E I such that i. 1) Accordingly, the optimum noiseless coding rate Rp for the source S, shall be denoted by R$. Definition I: A sequence of codes {qP: n E I} E P is weakly universal for noiseless coding of {Sa}if i.2)
Universal Coding of a Family of Sequential Sources at a Fixed Distortion Level: We survey some of the results that have been obtained concerning the problem of the existence of universal sequences of codes for the coding of a family of sequential sources at a fixed distortion level. In the following, a E J, n 2 N . I f,.(n)(G E ) , r(X,", we take {Sa}to be a family of stationary ergodic sequential Noiseless Universal Coding of Sequential Sources: We sur- sources having the same source alphabet A. We impose a vey some of the results that have been obtained concerning single-letter fidelity criterion {p": n E I}, and we take the problem of the existence of universal sequences of codes f,.(n) G f d ( n ) n. Let P be the coding prescription in for the noiseless coding of a family of sequential sources. We which {4,: n E I} E P, if and only if +n is a variable-length suppose here that the sources in the family {S,} are stationary code on A" for every n. It is known [94], [76] that there exists a weakly universal sequential sources having a common finite alphabet A. The coding prescription P is taken to be the set of all sequences sequence of codes in P for coding of {Sa}at the distortion {4n:n E I}in which 4" is a noiseless variable-length code level D if sup, R$(D) < 00 and p1 satisfies some mild on A" for every n; the rate scale f,. is defined by f T ( n ) n. requirements. For the special case when the source alphabet The field of noiseless universal source coding can be said A is finite, Omstein and Shields [lo51 have recently exhibited to have originated with Kolmogorov [85]. Davisson's seminal a sequence of codes which is a weakly universal sequence paper [41] unified the scattered universal coding results that of codes for coding the sources {Sa}at a fixed distortion had appeared after Kolmogorov's paper, and provided the level. Garcia-Munoz and Neuhoff [52] showed that there will impetus for subsequent studies in universal coding. Davisson exist a strongly universal sequence of codes for coding {Sa} showed that a weaky universal sequence of codes for noiseless at a fixed distortion level if p1 is a metric on A, if {Sa}is Definition 2: A sequence of codes {4n:n E I} E P is strongly universal for noiseless coding of {Sa}if for each E > 0, there exists N E I such that
+
1483
KIEFFER A SURVEY OF THE THEORY OF SOURCE CODING WITH A FIDELITY CRITERION
totally bounded in the rho-bar metric (a metric on classes of sources that had been proposed by Gray et al. [60] for universal coding applications), and if certain other natural requirements are made. For the case of a finite source alphabet, Kieffer [77] showed that the requirement of Garcia-Munoz and Neuhoff that {S,} be totally bounded in the rho-bar metric could be weakened to total boundedness of {S,} in a certain entropy metric.
p1 is a metric on A under which A is a separable metric space. They showed the existence of a strongly universal sequence of codes in P for coding of {Sa}at a fixed rate level if p1 is a metric on A such that A is a complete separable metric C. Universal Source Coding at a Fixed Rate Level As in the previous subsection, we suppose that the sources space under p1 and {Sa}is totally bounded under the rho-bar {S, : Q E J}have in common a fidelity criterion { p": n E I}, metric. Suppose in addition that the source alphabet A is finite a coding prescription P, a rate scale fp: I + (0, oo), and a and that each source S, is a Markov source (meaning that the source output is a Markov process); Neuhoff and Shields [lo21 distortion scale fd: I --f (0, CO). Since the function D p ( R ) , found a necessary and sufficient condition for there to exist a R > 0, may vary from source to source within the family {S,}, the function D p ( R ) for the source S, shall be denoted strongly universal sequence of codes for the coding of {S,) at a given rate level. by DX (R). Definition 1: A sequence of codes {+": n E I} E P is WI. CODING FOR MULTITERMINAL SOURCES weakly universal for coding of {S,} at the rate level R > 0 if
j.1)
and j.2)
Definition 2: A sequence of codes {+": n E I} E P is strongly universal for coding of {Sa}at the rate level R > 0 if for any E > 0 there exists N E I such that k. 1)
~ ( x :4"), I f r ( n ) ( R+ E ) , ,
Q
E J, n 1 N ;
k.2)
+
p"(X,n,4") I f d ( n ) ( G ( R )
E),
Q
E J, n
1N.
Universal Coding of a Family of Sequential Sources at a Fired Rate Level: We shall now survey some of the results that have been obtained concerning the problem of the existence of universal sequences of codes for the coding of a family of sequential sources at a fixed rate level. In the following, we take {Sa}to be a family of stationary ergodic sequential sources having the same source alphabet A. We impose a single-letter fidelity criterion {p": n E I},and we take f,.(n) E fd(n) n. Let P be the coding prescription in which {+": n E I } E P, if and only if 4" is a fixed-length code on A" for every n. Ziv [133] was the first to obtain a result on the universal coding of a family of sequential sources at a fixed rate level. His results indicate that if the distortion measure p1 is a metric on the source alphabet A, and if A is totally bounded relative to this metric, then there exists a weakly universal sequence of codes in P for coding {S,} at any fixed rate level. Neuhoff et al. [loll showed the existence of a weakly universal sequence of codes in P for coding of {S,} at a fixed rate level when the source alphabet A is finite or when the distortion measure
A multiterminal source consists of a network of sources; data from these sources is separately coded for transmission to a receiver. For simplicity, we first state the basic coding problem which arises from a two-terminal source. A two-terminal source is a pair of abstract sources S, = [X, {X": n E I}] and S, = [Y, {Y":n E I } ] in which the source outputs X , Y are defined on the same probability space. For a sufficiently large n, one wishes to code the representations X " , Y" for transmission to a receiver; the system for accomplishing this is depicted in Fig. 3. One encodes X " , Y" separately by means of variable-length codes dg, q5;, respectively; the resulting codewords +,"(X"), $;(Y") are then pooled, yielding the concatenated codeword 4; (Xn)q5F(Y") which is processed by a decoder to furnish to the receiver the estimates X", Y n of X", Y", respectively. Fix a fidelity criterion {p,": n E I} for the source S, and a fidelity criterion { p i : n E I} for the source S,. Fix two distortion scales fsd: I + (0, oo), f,d: I + (0, m) and two rate scales fzr: I + (0, oo), fyr: I + (0, oo). Fix coding prescriptions P,,P, for S,, S,, respectively. Definition: Let D, 2 0, D, 1 0. A pair (R,, R,) is an admissible pair of rates for coding of the two-terminal source (Sz, S,) at the distortion levels (D,,D,)if there exist {d,":n E I} E P,,{4;: n E I}E P,,and functions X", Y" of &(Xn)q5,"(Yn) for each n such that h
h
h
h
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 39, NO. 5, SEFEMBER 1993
1484
In the two-terminal source coding problem, one wishes to determine the set of all admissible rate pairs for each pair of distortion levels. Although we shall not formally describe the general multiterminal source coding problem, we can infer the form of this problem. In k-terminal source S2, . . . ,Sk with outputs coding, one deals with k sources SI, X I , X2, , X k , respectively. One can then try to solve the problem of determining all admissible rate k-tuples ( R I ,R2, ,R k ) such that if the representations of each X , are asymptotically coded at rate level R, and the separate codewords are pooled, then the representations of each X , can be asymptotically recovered by the decoder with distortion less than or equal to a given distortion level D,. (The rate levels and distortion levels involved are measured according to given scales.) One can make the multiterminal source coding problem more involved by allowing information about the source Sa output to flow to the source S, encoder for certain pairs ( 2 , j ) , where this information may or may not be made available to the decoder. In this more general problem, one would also be interested in determining possible vectors (Ra,3: i, j = l , . . ,.k ) where R,,, is the rate level at which information about the source S, output is sent to the S, encoder, One can consider even more complicated multiterminal source coding networks in which there can be more than one decoder, with each decoder receiving coded information about the representations of some subset of the { X , } . Coding Theoremsfor Sequential Multiterminal Sources: We survey some of the results that have been obtained in coding of a multiterminal source consisting of a network of sequential sources. For simplicity, we restrict ourselves to two-terminal sources; some of the results to be stated can be easily generalized to the case of a multiterminal source consisting of three or more sources. In the following, we consider a two-terminal source consisting of the sequential sources S, = [ X , { X " : n E I } ] , . S , = [Y,{Yn: n E I}], which means that for each n, X" and Y" consist of the first n samples of the random sequences X = { X a } ,Y = {Y,}, respectively. We employ the Hamming fidelity criterion for each of the sources S,, S,. The coding prescription P, for S, is the one in which {&: n E I } E P,, if and only if &E is a fixed-length code for X" for every n. The coding prescription Pyfor S, is defined analogously. The rate and ) f y r ( n ) fxd(n)= distortion scales are defined by f x T ( n =
encoder)
Fig. 4.
I
R
I
Kaspi-Berger coding configuration.
multiterminal source coding networks can be attacked from a graph-theoretic viewpoint. The reader is referred to the papers [4], [5], [38], [39] for an account of zero-error multiterminal source coding theory via graph theory. In the following, we survey some of the results concerning multiterminal source coding in which distortion is allowed. Suppose that the pairs (Xi, are independent and identically distributed. Given the distortion levels (Ox, D,)one wishes to determine the set of all admissible rate pairs (R,, R,) for coding of (S,, S,) at these distortion levels. This is a hard problem which has not yet been solved in general. However, progress has been made on special cases of the problem. First, Wyner [126] and Ahlswede and Komer [3] determined the set of all (R,, R,) which are admissible rate pairs for coding of (S,, S,) at the distortion levels (D,,D,) = (0, 00). Then, for a fixed D 2 0, Wyner and Ziv [127] determined all R, for which (R,, H ( Y)) is an admissible rate pair for coding of (S,, S,) at the distortion levels (Ox, D,) = (D, 00). Then, for fixed R 2 0, D 2 0; Berger et al. [20] determined a subset of the set of all R, for which (R,, R) is an admissible rate pair for coding of (S,, S,) at the distortion levels (D,, D,) = (D, 00). Recently, Berger and Yeung [22] proved a coding theorem for a two-terminal source network which unifies and extends the results of [126], [3], [127]. Again, suppose that the pairs ( X , , are independent and identically distributed. Kaspi and Berger [74] consider a generalization of the two-terminal coding problem of the previous paragraph; they allow side-information about the S, representation to be furnished to the S, encoder as well as the receiver. (See Fig. 4.) For a given pair of distortion levels (D,,D,),a rate triple (R,, R,, R) is admissible if the distortion levels (D,,Dy) can be achieved asymptotically as n -+ 00 when X n is coded by the S, encoder at a rate no greater than R, bits per sample, Y" is coded by the S, encoder at a rate no grater than R, bits per sample, f y d ( 4 = 7%. If the pairs ( X g y, Z ) are independent and identically dis- and side-information about Y n is encoded for transmission tributed, Slepian and Wolf [116] showed that (R,, R,) is an to the S, encoder and decoder at a rate no greater than R admissible rate pair for the coding of (S,, S,) at the distortion bits per sample. A subset of the entire set of admissible rate levels (D,,D,) = (0, 0), if and only if R, 2 H(X 1 Y ) , triples (R,,R,, R) was determined in [74] for the general 4 2 H(Y I X ) , and R, R, 2 H ( X , Y), where H(X I Y) Kaspi-Berger configuration; the paper [78] extends this result and H ( Y 1 X ) denote conditional entropy rates and H ( X , Y) to the case of a jointly stationary and ergodic pair (X, Y) denotes joint entropy rate. Cover [33] extended this result to using a different method of proof. The entire set of admissible the case of jointly stationary and ergodic X, Y and also to the rate triples was found in [74] for some special cases of the case of multiterminal sources with more than two terminals. basic Kaspi-Berger configuration. The determination of the The Slepian-Wolf multiterminal source coding problem is an set of admissible rate triples for the general Kaspi-Berger example of a zero-error multiterminal source coding problem. configuration remains an open problem. In zero-error multiterminal source coding, one requires that Other results on two-terminal sequential source coding may all the distortion levels be set equal to zero. Zero-error be found in the papers [69], [73], [23].
=
+
x)
x)
KIEFFER A SURVEY OF THE THEORY OF SOURCE CODING WITH A FIDELITY CRITERION
VIII. FUTURERESEARCHTRENDS
A. Determination of Rate-Distortion Functions Much needs to be done in terms of obtaining explicit formulas for rate-distortion functions or, lacking this, obtaining iterative algorithms for computation of rate-distortion functions. SequentialSources: Let S = [ X , {Xn: n E I } ] be a sequential source with single-letter fidelity criterion and the rate and distortion scales f T ( n ) f d ( n ) n. Let R ( D ) be the rate-distortion function for S. In this paragraph, we require the source alphabet to be finite. If S is memoryless, then explicit formulas for R(D) are known for certain distortion measures [17, ch. 21, and for general distortion measures the algorithm of Blahut [26] can be used to compute successive approximations to R ( D ) . The next simplest case is the case in which S is a Markov source; in this case, Gray [57], [58] obtained a number D, > 0 and an explicit formula for R ( D ) in the range 0 < D 5 D,,for the Hamming fidelity criterion. However, a description of R(0) valid for all D is not known for Markov source models, even for the simplest Markov source model, the binary symmetric Markov source. (The papers [ll],[19] shed further light on the rate-distortion function for sequential finite-alphabet Markov sources.) Supposi the source alphabet is the real line and that one employs the squared-error fidelity criterion. Then, an explicit formula for R ( D ) is known if X is a stationary Gaussian random sequence and for some instances in which X is a nonstationary Gaussian random sequence [17, ch. 41. However, little is known concerning the computation of rate-distortion functions for infinite-alphabet non-Gaussian sequential source models with memory. Multiparameter Sources: Hajek and Berger [67] identified a class of multiparameter sources with Markov random field output and binary alphabet and Bassalygo and Dobrushin [151 identified another class of finite-alphabet multiparameter sources for which there exists D, > 0 and an explicit formula for R ( D ) for 0 < D 5 D,. The Hajek-Berger paper also gives a lower bound on D,; subsequent work [104], [24] has shown that the lower bound can be improved in certain cases. However, the determination of R ( D ) for all D > D, for the sources considered by these authors remains an open problem.
=
=
B. Determination of Vector Quantizers' A computationally efficient algorithm needs to be found which enables one to find an optimal N-level vector quantizer for quantizing a given random vector. (At the very least, one would like such an algorithm for quantizing Gaussian random vectors.) Although the problem of finding optimal vector quantizers may be hopelessly complex for arbitrary distributions [53], one can hope that there is a reasonble computational procedure for sufficiently smooth distributions. The search €or good lattice vector quantizers is currently receiving much attention. As Ziv [137] has indicated, it is possible to find lattick vector quantizers that yield codes
1485
with close to optimal rate for a memoryless source with real alphabet and sufficiently smooth probability density, provided the desired distortion level is sufficiently small.
C. Delay Constrained Source Coding Consider a sequential source with output X = ( X I ,X2, . . .). In a real-world system in which the successive entries of X are encoded for transmission to some user, there is a random time delay T, between the time that source symbol X , enters the en?der and the time that the decoder generates the reproduction X, of X,. One may wish to consider only coding schemes in which the maximal expected time delay max,ln E[T,] is constrained not to grow too rapidly with n as n ---f 00. Other types of delay constraints are possible: one may want to control the growth of the average delay n-l E,