pulse code modulation (DPCM) schemes operating on discrete-tbue auto- regressive ... ing of the quantizer outputs by means of a buffer-instru- mented ...
402
IEEE
TRANSACTIONS
ON INFORMATION
THEORY,
VOL.
IT-31, NO. 3, MAY 1985
Rate-Distortion Performance of DPCM Schemes for Autoregressive Sources NARIMAN
FARVARDIN, STUDENT MEMBER, IEEE, AND JAMES W. MODESTINO, SENIOR MEMBER, IEEE
Absrruct- An analysis of the rate-distortion performance of differential pulse code modulation (DPCM) schemes operating on discrete-tbue autoregressive processes is presented. The approach uses an iterative algorithm for the design of the predictive quantizer subject to an entropy constraint on the output sequence. At each stage the iterative algorithm optimizes the quantizer structure, given the probability distribution of the prediction error, while simultaneously updating the distribution of the resulting prediction error. Different orthogonal expansions specifically matched to the source are used to express the prediction error density. A complete description of the algorithm, including convergence and uniqueness properties, is given. Results are presented for rate-distortion performance of the optimum DPCM scheme for first-order Gauss-Markov and LaplaceMarkov sources, including comparisons with the corresponding rate-distortion bounds. Furthermore, asymptotic formulas indicating the high-rate performance of these schemes are developed for both first-order Gaussian and Laplacian autoregressive sources.
I.
INTRODUCTION
HE EVER-GROWING DEMAND for transmission and storage of data necessitatesmore efficient use of existing transmission and storage facilities. A data compression system is any schemethat operateson source data to remove redundancies so that only those values essential to reproduction are retained. Typical source signals generally contain two types of redundancies: First, there is redundancy due to the high serial correlation in source outputs. This redundancy, which is concomitant with a nonuniform power spectral density, can be reduced considerably through the use of predictive encoding schemessuch as differential pulse code modulation (DPCM). Roughly speaking, the source signal can be conceived as having two parts. One part is predictable relative to the transmitted sequence and hence conveys no useful information; the other part (the prediction error) is unpredictable, and since it uniquely determines the signal, it contains the useful information. A DCPM encoding scheme attempts to discard the predictable part, becauseit can be reproduced at the receiver, and encodesonly the unpredictable portion by
T
Manuscript received October 7, 1983; revised September 13, 1984. This work was supported in part by the Office of Naval Research under Contract NOOO14-75-C-0281.The material in this paper was presented in part at the 1983 IEEE Symposium on Information Theory, St. Jo&e, PQ, Canada. N. Farvardin was with the Electrical, Computer, and Systems Engineering Department, RensselaerPolytechnic Institute, Troy, NY 12180. He is now with the Electrical Engineering Department, University of Maryland, College Park, MD 20742, USA. J. W. Modestino is with the Electrical, Computer, and Systems Engineering Department, RensselaerPolytechnic Institute, Troy, New York 12180, USA.
using a zero-memory quantizer. The second type of redundancy is due to the nonuniform probability distribution of the encoded signal. That is, the discrete levels at the output of the DPCM quantizer do not occur with equal probabilities. This type of redundancy can be removed through the optimal design of the quantizer subject to an entropy constraint, and it generally requires entropy coding of the quantizer outputs by means of a buffer-instrumented, variable-length coding scheme. Because the quantizer is nonlinear, exact analysis of predictive coding schemesis difficult. O’Neal [l] analyzed the mean-square performance of DPCM systems for stationary Gauss-Markov processeson the assumption that the prediction error distribution is Gaussian. Others have performed an analysis of the DPCM system based on the assumption of decomposability of the prediction error into the overload and granular error [2], [3]. In contrast to these approaches,Fine [4] performed an exact analysis of the mean-squareperformance for a delta modulation system operating on a sampled Wiener process. Hayashi [6] extended this result to DPCM systemswith an equi-step quantizer. Masry and Cambanis [ll] provided an exact analysis for delta modulation of the continuous parameter Wiener process.To the authors’knowledge these are the only exact analyses concerning DPCM system performance. Slepian [21], Amstein [5], Hayashi [7], and Janardhanan [8], all inspired by Davisson’s idea [19], [20] of obtaining a Hermite polynomial series approximation for the distribution of the prediction error, have reported various results regarding the optimality of DPCM systems in a minimum mean-square error sense for Gauss-Markov processes. Gibson and Fischer [38], on the other hand, have studied an optimal alphabet-constrained data-compression scheme that entails the DPCM system as a special case. However, little work has been done in characterizing the optimum rate-distortion performance of DPCM schemes.While there has been some limited work in approximating the optimal rate-distortion performance for Gaussian sources at high rates [9],[33], there has been no complete analysis for a wide range of rates even for Gaussian sources. In this paper we present a study of the rate-distortion performance of DPCM schemesoperating on autoregressive discrete-time sources. A novel iterative algorithm is developed for design of the DPCM quantizer to minimize the mean-square distortion subject to a constraint on the
0018-9448/85/0500-0402$01.00 01985 IEEE
FARVARDIN
AND
MODESTINO:
DPCM
FOR AUTOREGRESSIVE
N-lt?lJel
Quantizer qN3(-)
Y
403
SOURCES
Entropy s Encoder
Digital Data Link
Optimum
X'
Decoder
Predictor
, Optimum M-Predictor
Fig. 1. Block diagram of genericpredictive coding scheme.
first-order output entropy. Also, we use different orthogonal seriesexpansionsto approximate the distribution of the prediction error. These expansionsare specifically matched to the distribution of the innovation sequence generating the autoregressiveprocess.Results on the ratedistortion performance of DPCM schemese m p loying uniform-threshold quantizers for first-order Gauss-Markov and Laplace-Markov sourcesare presented.Also, asymptotic results for high rates are developedfor both sources. The organization of this work is as follows. In section II we describe a general predictive coding scheme,which is then restricted to the special case of DPCM. Then we describe certain properties of DPCM encoding schemesin Section III. In Section III we also carefully formulate the problem of optimal encoder design and determine the necessary conditions for optima&y. In Section IV two algorithms for optimally designing DPCM encoders are described. Section V is devoted to a thorough study of the prediction error distribution and its evaluation through orthogonal series expansion methods. In Section VI we present numerical results demonstrating the efficacy of optimum DPCM schemes.In Section VII asymptotic results are provided for high rates, together with techniques for bounding the rate-distortion function. Theselatter techniques are particularly useful for characterizingthe performance of non-Gaussiansources;F inally, in Section VIII a summary and suggestionsfor future researchare included. II.
PRELIMINARIES AND NOTATION
W l=l
n = 1,2;.*,
2;‘=0.
(24
while n = 2,3,-e*, 2; = E{Xnl.4-1>, (2b) where E { +] . } denotes conditional expectation given a u-algebra [lo]. Let us denote by 2” the instantaneousestimate of X, derived by observing Y,, Y,, . . . , Y, given by n = 1,2, ..a. R = E{ X,l.$J, (3) Combining (l), (2), and (3) yields
In this section we briefly describe a generic predictive encoding scheme operating on an M th-order autoregressive source.W e obtain the optimum systemstructure for a fixed quantizer, mention the impediments in analysis and implementation of the optimum system, and then reduce the system to the suboptimum DPCM schemefor further analysis. W e assumethat the signal to be encodedcan be m o d e led as an M th-order time-discrete autoregressiveprocessdescribed by the recursion
x, = 5 p,x,-, + w,,
This m o d e l has been chosenboth becauseit is often a good mathematical m o d e l for real data (e.g., speechand images) and becauseit provides a well-understood standard for comparison [4]-[9], [25]. The block diagram of a genericpredictive coding scheme is illustrated in F ig. 1. Upon observing the transmitted sequenceYi, Y2,* . * , Y,-i, the predictor estimatesthe value of the source signal at tim e instant n. This estimate X:, which can also be m a d e at the receiver(in the absenceof channel errors), is then subtracted from the input X,, to obtain the sequenceE,, = X,, - 2:) called the prediction error. The sequence{E,}, which contains (almost) only “new” information about { X, }, in then coded and transm itted as the sequence{ Y,,}. Let us assumethat &k is the smallest u-algebragenerated by Y,, Y,,. . ., Y,. Then the casual least mean-square estimate,’called the predictedestimate,of X,, upon observing Yi, Y,; . ., Y,- 1 is given by
(1)
m=2 (4)
The last equation is a direct consequenceof the fact that { W ,} is a zero-meanand independent sequence.Furthermore, we can write
where pl, p2,-. 0)pw are the autoregressionconstants and i!&= E{k; + E,(Jal,} { W , } is a zero-meansequenceof independentand identically distributed random variables possessingcommon n = 1,2.... = E{ ~:IJ@‘,}+ E{E,W ’,), (5) variance uk. Furthermore, we assumethat the initial state (X0, x-1,. * -7 x-,+1 ) is specified and we are only inter‘Throughout this work, we confine ourselves to the squared-error distortion criterion. ested in the sourceoutputs for n 2 1.
404
IEEE TRANSACTIONS
X” -+ Xn
N-kVd
Yn
Entropy :-
Digital Data Link
ON INFORMATION
Entropy
THEORY,
VOL.
IT-31,
NO.
3, M A Y 1985
‘;,
Decoder i
Fig. 2. Block diagram of DPCM coding system for Mth-order autoregressivesource.
Equation (11) describes the simplified version of our predictive coding scheme.This scheme,which is well known as differential pulse code modulation, has been widely discussed in the literature [l]-[9]. The block diagram of this system is illustrated in Fig. 2. In the rest of this work we will reStrict attention to this scheme. Unfortunately, the system described by (4) and (6) is Our go&l is to design the quantizer in such a way that the extremely difficult-if not impossible-to implement. The DPCM coding scheme is optimized in a rate-distortion conditional expectations in (4) and (6) are the main draw- theoretic sense.More specifically, we wish to m inimize the backs here becausethey cannot be easily computed. overall average distortion, while the transmission rate is Our main objective is the study of the quantizer struc- kept below a prescribed level. In the following section, we ture and its resulting effect on overall systemperformance. will establish certain properties of the DPCM scheme Thus, to make the problem somewhat more tractable, we under consideration and formulate the optimum design force the predictor to have a simple structure. problem. Let us suppose that the prediction error sequence{ E,, } is “neirly” independent. This statement, although someIII. PROPERTIES OF THE DPCM SCHEME AND what heuristic, is meaningful from a practical point of PROBLEM FORMULATION view, especially when the system is optimized (as we shall We begin this section by substantiating (10) through the describe later) for operation at high rates. If this assumption holds, then the transmitted sequence{ Y, } will also be following theorem. nearly independent. This assumption implies that [12] Theorem I: Let us assume that qN is an N-level zero-memory quantizer with input thresholds T, -e Tl < n2m, l = r,, XE(T,-JJ, 1=1,2;**,N, (9) G(X) = Q,, Proof: First note that where T,, r,; * ., TN, and Q,, Q2; . a, QN are the threshold E{E,IY, = Q,} = E{E,IT,-, < E,, I T,}, levels and the quantization levels, respectively. (To the I= 1,2;**,N. (12) extent possible, we use notation identical to that of [13].) Then we can show, in Theorem 1, that But, by assumption, n = 1,2;.., (10) W W -A> = Y,, I= 1,2;*.,N. (13) But { &‘,, } is a nondecreasing sequence of u-algebras (i.e., .&‘“-1 c A?~),and 2:) by definition, is an ~@~-,-measurable random variable [lo]. Therefore, .?z is dn-measurable as well, and (5) reducesto n = 1,2.-a. kn = 2’ + E{ E,Id”}, (6)
Q, = E{.W’L
< 4, I T,),
provided only that the quantization levels are chosen opti- Therefore mally. % % IY, = Q,> = Q,, I= 1,2;..,N, (14 With these simplifications, (4) and (6) can be written as which proves the theorem. n = 2,3,-e. 7 2; = 5 pmka-,, (114 Recall, that for the case of a zero-memory quantizer, m=l driven by a memorylesssource, the amount of information and delivered by the output process about the input process n = 1,2-e*. 2n=2; + Y,, @ lb) equals the entropy of the output process [26]. The follow-
FARVARDIN
AND
MODESTINO:
DPCM
FOR AUTOREGRESSIVE
SOURCES
405
ing theorem establishesa similar relationship in predictive of Markov processeswill recognize that any M th-order quantization schemes. Markov process can be representedby a first-order Mdimensional Markov process. Theorem 2: In the predictive coding schemedescribed Using the Chapman-Kolmogorov equation for the proby (ll), if XN 4 (X,, X,; . a, X,) and YN A cessdefined by (19) with M = 1 and p g pl, we obtain (Y,, y2,. * a, Y,), then _^ /$irnm+1(X”; where H,(Y)
YN) = H,(Y),
05)
is the entropy rate of the { Y, } sequence[27].
Proof: W e have
I(XN; Y”) = H(YN) - H(YNIXN).
(16)
(Here, we are using the samenotation as in [27].) But there is a one-to-one relationship between XN and EN A (E,, E,,. . ., EN). Furthermore, EN uniquely specifiesYN. Therefore, given X”, there is no uncertainty concerning Y “, and thus I(XN; Y”) = H(YN).
07) Dividing through by N and passing to the lim it on N yields the desired result. In regard to the quantization error, (llb) can be used to show that .$,‘E,-
Y,=X,,-&,
08)
p&4
pwb -CO
= I'--
- Pb
-
4NbNlPE,JY) 43 w
in which PE (e) and Pw( *) designatethe pdf’s of E, and W , , respectijely. To be able to design the quantizer optimally, we require knowledge of the steady-statepdf of the sequence{ E,, }, which will be denoted by PE( e). A legitimate question, obviously, is whether such a steady-statepdf exists. Gersho [14] was the first to study this issuein delta-modulation systems. Based on a theorem of Doob [15], he proved that under certain conditions on the source distribution, the joint distribution function of the (X,, Xn) process convergesto a unique stationary distribution, regardless of the initial condition X1 and the quantizer stepsize.This, in turn, implies the convergenceof {E,,} in distribution. G o ldstein and Liu [16] subsequentlyextended Gersho’s results to adaptive DPCM (ADPCM) systemsin which the quantizer was taken to be an N-level uniform quantizer whose stepsize varies in tim e according to a well-described rule. Their results, of course, include nonadaptive DPCM systemsas a specialcase. More recently, Kieffer [17] has established a stronger type of convergencefor predictive quantization schemes. Specifically, he has shown that under certain conditions the triple process {X,, Y,, X,,} is stochastically stable in the sensethat the sequence{(l/N)Cf;‘=,f( Xr, Y,“, Xp)} converges almost surely for every bounded function f that is continuous in the first two variablesand measurablein the third. Here XF A (X,, Xn+l,. . * ). These convergenceargumentsare useful in establishing the desired result that the sequenceof functions { PE,(a)} defined by (20) convergesto a unique function PE( a). In fact, if we write (20) in operator notation as
where 5, is the error incurred solely by the quantization process.’This is interesting, since it implies that to m inim ize the overall reconstruction error, it suffices to m inim ize the quantization error. At this point we can state the problem more precisely. W e wish to design an N-level quantizer for the DPCM schemesuch that the overall averagesquared-errordistortion-or, equivalently, the averagequantization error-is m inimized, while the entropy rate at the quantizer output is held below a perscribed value, say H,. Farvardin and Modestino [13] have studied a similar problem for a zero-memory quantizer driven by a memorylessstationary source. In the present situation, unfortunately, the probability density function (pdf) of the quantizer input (i.e., the prediction error) is not known. The following discussion (21) PE, = TPE,& will provide some insight as to how this pdf can be then G o ldstein and Liu show that l), assumingthat probcalculated and will point out the associateddifficulties. ability functions exist, there exists a unique function PE( a) Let us first note that such that [16, Theorems5 and 61 M
En = c
M
P,,,&-,
m=l
=
E P,[&-,m=l
+ w,
-
c
Pdn-m
m=l
4N(Km)I+ W ,. (19)
PE=
TPE,
(22)
and 2) pEn(.) convergesto pE( -) [16, Theorem 91. Equation (22) is equivalent to the following integral equation:
Equation (19) implies that the prediction error sequence PE(X) = /wmmPw[X - P(Y - qN(d)] Ph> &. (23) { E, } is also a Markov processof the same order as the input process { X, }. W e shall confine ourselves,hereafter, Therefore, for a fixed quantizer qN( a), the lim iting margito first-order processes.Those conversantwith the theory nal pdf of the prediction error can be obtained from the above integral equation. Equation (23) revealsthe explicit dependenceof the steady-statepdf pE( .) on the quantizer ‘Hereafter, for economy of notation, we do not indicate the time instants for which the equations hold with the presumption that they can structure qN( -). O n the other hand, to determine the be understood from the context. optimum quantizer, the marginal pdf of the prediction
406
IEEE TRANSACTIONS
error is required. This results in a complex interdependence between the pdf of the prediction error and the quantizer structure. Indeed, this is the main difficulty in obtaining the optimal quantizer. Recall that we intend to design the quantizer subject to a constraint on its output entropy rate. To calculate the entropy rate at the quantizer output, however, we need to have the N-fold probability density function of the sequence { E, }. Due to the myriad of difficulties associated with this, we restrict ourselvesto the first-order entropy at the quantizer output. This clearly results in a suboptimum system. However, at high rates, where the prediction error, and hence the quantizer output, are highly uncorrelated, the entropy rate is approximately equal to the first-order entropy. The problem of quantizer design can now be stated as follows. We wish to find an N-level quantizer q$( .) that m inimizes the averagesquared-errordistortion incurred in the quantization operation while the first-order entropy at the quantizer output is held below a certain value. The average distortion and output entropy are given by D = lgl f;,‘x
-
Qr)2~~(d dx
(24
ON INFORMATION
THEORY,
VOL.
IT-31, NO. 3, M A Y 1985
where %7is the collection of all couples (qN; pE) satisfying Conditions 1 and 2 simultaneously. In the following section, we describe two algorithmic methods for obtaining D,(H,J. Essentially, these algorithms work by iteratively applying Conditions 1 and 2. In Condition 1 we have to design an optimum entropyconstrained quantizer for a memorylesssource. We do not elaborate on this; details, including specific algorithms, can be found in [13]. In Condition 2 we need to solve the integral equation (23). Section V is devoted to this issue. IV.
ALGORITHMS
Recall that ih obtaining the optimum quantizer, the quantities of interest are the average distortion and the quantizer output entropy, both of which are functionals of the prediction error pdf pE( .) and the quantization mapping q&e), with a specific interdependencebetween pE(.) and qd.1. The algorithm used to solve for the optimum quantizer is an iterative method in which the quantizer is optimized at each iteration and then the pdf of the prediction error is updated. This algorithm is describedin the following steps. Algorithm 1
and N
H = - c pIlog 2P/ b/sample, (25) I=1 respectively, where pE(.) is the solution to the integral equation given by (23) and p, is defined by pI=/T;;E(x)dx,
I= 1,2;..,N.
(26)
Unfortunately, becauseof the particular type of dependence of pE( 0) on qN( e), necessary conditions for the optimality of the quantizer cannot be expressedin a concise closed form. However, by looking at the problem from a slightly different perspective, we can obtain a more appealing formulation of the problem, which leads to an algorithmic approach for quantizer design. More precisely, we seek and optimum N-level quantizer qz( .) and a probability density function pz( *) or, equivalently, a couple (q,$ pg) such that the following two conditions are simultaneouslysatisfied. Condition 1: qz( a) is an optimum N-level quantizer with entropy constraint H, for a memorylesssource with stationary marginal pdf pg( e). Condition 2: pj$( .) is the solution to integral equation (23), with qN = q$. Let us denote by D(q,; pE) the average distortion incurred in a DPCM coding schemepossessingan N-level quantizer qN( .) and a steady-state prediction error pdf pE( e). Then the optimum (rate-distortion theoretic) performance attainable by an N-level DPCM coding schemeis given by (27)
1) Choose an initial N-level quantizer q$)( -) with entropy constraint H, and set i = 0. 2) Set i = i + 1; for the fixed quantizer q$)( 0) solve (23) to find the steady-statepdf p$)( e). 3) For the fixed pdf p$)(-) use the quantizer design algorithm to obtain a quantizer q#+l)( .) subject to entropy constraint H,,. 4) If the difference in the quantizer structure3 in two consecutive iterations is greater than some prescribed small E > 0, go to 2). Otherwise, halt with pE( .) and qN( a) equal to pg( .) and q$( e), respectively. In step 3), either of the algorithmic methods describedin [13] can be used to obtain the optimum quantizer. Let us note that in solving (23) at each step of Algorithm 1, we calculate the steady-statepdf of the prediction error. Another approach, which is a slight modification of Algorithm 1, develops a sequenceof quantizers evolving in time ultimately converging to the optimum quantizer. This approach is presented in the following. Algorithm 2 1) Choose an initial N-level quantizer q$( .) with entropy constraint H,; set p’& .) = p & .) and i = 0. 2) Set i = i + 1; for the fixed quantizer q,$)(.) use 3By the difference in the quantizer structure in two consecutive iterations we mean c I-l
IT(i) - T,(‘-‘)I + 2
[Qj”
-
Q/‘-*)1,
where
T,c’)
md
Q,‘i’,
I-1
are the Ith threshold and quantization levels, respectively, at the ith iteration.
FARVXRDIN
AND
MODESTINO:
DPCM
FOR AUTOREGRESSIVE
407
SOURCES
p$’ = Tp$-‘)
to find p$)( a). Here T is the operator defined in (21).4 3) For the fixed pdf p’$( .), use the quantizer design algorithm to obtain a quantizer q$+l)( *) subject to entropy constraint HO. 4) If the difference in the quantizer structure in two consecutiveiterations is greater than someprescribed small e > 0, go to 2). O therwise,halt with pE( .) and qN( .) equal to pg( a) and qg( e), respectively. The m a in drawback in implementing these algorithms is the inherent difficulty in calculating the pdf of the prediction error-that is, step 2) of either algorithm. In the following section, we describea method for calculating the pdf of the prediction error and point out the associated difficulties. V.
DISTRIBUTION
OF THE PREDICTION ERROR
W e now proceed to calculate the marginal pdf of the prediction error assumingthat the quantizer is fixed. Equation (19), with M = 1 and p1 = p, can be written as E,, = W , + p&-l, (28) where I, is independent of W ,, m > n. Denoting the characteristic function (chf) of a random variable X by Qx( v), we can write (29)
@E,(Z) = @ww~@~"~1(P4~
and for the steady-statesituation %W
= @'wW~'E(PZ).
(30)
The chf Q t(z) of the quantization error is given by at(z) = ~~~~-jzQ/~~~~jzip,(,)l,(.)
dx,
(31)
where I,( a) is the indicator function of the interval (T,-,, T,]. Upon defining the Fourier transform of I[(.) by t),(z) 9 j--;e’Y,(x)
dx,
I = 1,2;. a, N, (32)
where && .) denotes the Hilbert transform [18] of @E(e) and is describedby5 (34b) For the special case of p = 1 and N = 2, F ine [4] has solved (33) for @E(a) using W iener-Hopf factorization techniques. Later, Hayashi [6] proved a very useful theorem that relates the pdf of the prediction error in a DPCM schemewith an N-level, symmetric, and uniform quantizer to that of a two-level scheme.Therefore, if attention is restricted to uniform quantization, it suffices to find the solution for a two-level quantizer. Unfortunately, for p # 1, which is most interesting in many practical situations, techniquesfor explicit evaluation of the prediction-error distribution have not yet been developed. Therefore, one must resort to efficient numerical methods to compute this quantity. The idea of using orthogonal seriesexpansionsfor computing the prediction error distribution was apparently first suggestedby Davisson [19],[20]. His suggestionwas then followed up by Slepian [21], Arnstein [5], Hayashi [7], and Janardhanan [S]. Each of theseworks considereda Gaussianinput and a Hermite polynomial expansionfor the pdf of the prediction error. Arnstein’s work [5] is most relevant to the present study. Amstein, however, does not attempt to optimize the system in a rate-distortion theoretic sense. Instead, he devisesan algorithm (similar to Algorithm 2 of this paper) by which the quantizer is designed only to m inimize the averagesquared-errordistortion for a fixed number of quantization levels. In what follows we consider Gauss-Markov and Laplace-Markov sources.For each sourcewe use a “suitable” orthogonal expansionthat is matchedto the distribution of the innovation sequencegeneratingthe sourceand that therefore approximates the prediction error density with a small number of coefficients in the expansion. A. Gauss-Markov Process
and using (31), we can write (30) as
The Gauss-Markov processis defined according to (1) with M = 1 and p1 = p, while we take {W,} to be a aE(z) = aw(z) 2 e-jzpQ~(ia,*$,(pz)) , (33) zero-mean sequenceof Gaussian random variables with variance cr& = 1. It turns out that, under appropriate [ I=1 I initial conditions, { X,} is a stationary zero-meanGaussian in which * designatesthe convolution operation. random processwith variance u,”= l/(1 - p2). Equation (33) is the generic form of the functional Now consider the Gram-Charlier series expressingthe equation describing QE( e). The integral equation (23), or its frequency d o m a in equivalent (33), are both extremely prediction error pdf pE,(x) in terms of Hermite polynomidifficult to solve, except in special cases.For N = 2 with als [22] by Q 2 = -Q, = (Yand T l = 0 (i.e., delta m o d u lation), (33) is -cYsP,:]
p2)P;log2[(l
-
p’)Pj’],
(65)
i#i*
which can be simplified to Ho = (1 - p2)Hi - p210g2[p2 +(1 - P”>P,:]
(59)
-(l
- p2)log,(l
- p”) +(1 - p’)P,:
(1 - p’)P,:
A. Asymptotic Results For the Gauss-Markov case,at high rates the prediction error possessesa normal density for which the Gish-Pierce asymptote is valid, as given by (47), with hg = i log 22rreuz.
2-
2 -
‘W
+ p2D.
*log2 p2 -(l - p”)P,:’
(66)
where H,’ is the quantizer output entropy when p = 0. Furthermore, the averagedistortion can easily be shown to be
(60)
D = (1 - p’)D’,
(61)
with D’ denoting the averagedistortion when p = 0. In the limiting case of fine quantization (where the Gish-Pierce result holds) we have PA = 0, and therefore,
But from (28) we have ‘E
ON INFORMATION
(67)
Thus, (47) can be written as Ho = (1 - p2)H,’ + Z’(p’), (62)
where &‘( .) is the binary entropy function given by
which determines the asymptotic behavior of the system at x+Y) high rates. Notice that for Ho sufficiently large we have b( Ho) = ( re/6)u$2-2H0,
@a)
Ho = i log,(*e/6)
= -alog2a
-(l
- (“)log,(l
+ i log,(u$/D).
(63b)
This assertsthat at high rates there is a (l/2) log 2( re/6) = 0.255 b/sample difference between R,(D) = R,(D) and the asymptotic result. This, of course, conforms with the previously reported results in [9] and [33]. For the Laplace-Markov source this situation is totally different. This is mainly due to the fact that the Gish-Pierce result [29] requires a certain degree of smoothnessin the source density. This, unfortunately, is not the casehere due to the impulse component in (41b). In the following we provide some discussion of this subtle issue. Let us consider a memoryless source whose pdf is described by (41b). We want to obtain asymptotic results for
- a),
OrCY (71) 12 which is the desired result. Note that a similar result holds for the general casein which the source density is a mixture of any smooth density and an impulse, provided that h, in (71) is replaced by the differential entropy of the smooth component. Carter and Neuhoff [34] have developed bounding techniques for the rate-distortion function of regenerativecomposite sources similar to that in (41b). Specifically, if the
FARVARDIN
AND
MODESTINO:
DPCM
FOR AUTOREGRESSIVE
415
SOURCES
Now we are in a position to determine the asymptotic performance of the DPCM coding scheme driven by P(X) = P2PlW +(1 - P2)P2(4Y (72) Laplace-Markov sources.W e assumethat at high rates the then the rate-distortion function is lower-boundedaccord- pdf of the error sequenceis close to that of the innovation ing to sequence.Thus using (71), we can write source density p ( .) is describedby
R(D) 2 D i&,{ 1, 2
p2Rl(Dl) +(l - P*)R,(D~)},
D = ~2”hi-(Ha-~~~?))/(l-~‘~~,
(W where R,(a) and R,( *) are the rate-distortion functions of pl( .) and p2( *), respectively,and 9=
{(D,, D,): P*D, +(I - p*)~, I D}.
(79)
where h; is the differential entropy of the smooth component in the pdf of the prediction error. This quantity can easily be shown to be
(73b)
hZ = 1 + log 2z b/sample. w-4 W h e n pi(x) = 6(x), which is the case in (41b), we have R,(D,) = 0, and hence the infimum in (73a) occurs at Again, using (61), we can write the asymptotic perforD, = D/(1 - p*). Thus mance as R(D) 2 (1 - p’)R,
g R(D).
(74
((1 _ P2)e2/3)2-*(H”-“r(pz))/(l-p2) D=
Using the Shannon lower bound [28], which is tight at high rates, for R *( .) in (74) we can write
1 _ ((1 - P*)e*p2/30ty)2-*(H,-~(~*))/(1-p2)
$ &Ho). (81) 2neD R(D) 2 (1 - p’) h, - + log,(75) For sufficiently large H,, (81) can be approximatedby l-p2 I * &(H,) = ((1 _ p2)e*/3)2-2(Ho-~(Pz))/(1-Pz), (82) From (71) we have 120 or, equivalently, H,, =X(p’) +(l - p*) h, - ilog,1 - p2 (1 - p2)e2 H,=&‘(p2)++&g2 3D . which enables us to determine the difference between the (83) rate-distortion function lower bound R(D) and the Comparing (83) to R(D) in (75), which is also a tight asymptotic performance expressedby (76). Unlike the lower bound to Rx(D) at high rates, resultsin a difference G ish-Pierce result we get7 equal to that given in (77), as one would expect. AR&Ho-R(D)=X’(p2)+q log 2y Asymptotic results illustrated in F igs. 3-5 and 6-8 are obtained by means of (62) and (81) respectively. = &‘(p’) + (1 - p*)O.255, b/sample. (77) Envision the memorylesssourceof (41b) as the output of B. Rate-Distortion FunctionBounds a switch that randomly and independentlymoves between For the Gaussiancasethe rate-distortion function can be two positions: up and down. The switch is up with probcalculated exactly [28], [32]. For the Laplacian case, for ability p* and down with probability (1 - p*). W h e n the which the rate-distortion function is not known exactly, we switch is up its output is zero with probability one. W h e n have used upper and lower bounds. The upper bound is the the switch is down its output is a real-valued variable well-known Gaussian upper bound [28] and the lower distributed according to (40a). W ith this in m ind an interbound is the autoregressive lower bound [32]. This lower esting interpretation for (77) can be provided. The quantibound is determined in terms of the rate-distortion funczation performance penalty is a factor (1 - p2) of the tion of the innovation sequence described by (41b). This ordinary penalty 0.255 b/sample plus an additional term rate-distortion function, in turn, has been lower bounded equal to the uncertainty in the switch position. by the Carter-Neuhoff composite lower bound described Note that, as one would expect, for p = 0, X(O) = 0, and AR = 0.255 b/sample. O n the other hand, for p = 1, in (74). (In F igs 3-8 the Gaussian upper bound and the AR = 0. This makes sense,for at p = 1 the sourceoutput combined autoregressivecompositelower bound are desigis zero with probability one and there is no uncertainty in nated by R J D) and R L( D), respectively.)W e have used the switch position. Straightforward differentiation implies the Blahut algorithm [35] for computing R2(.) in (74). that AR is maximized at VIII. SUMMARYANDCONCLUSIONS
12P-6)
P = /&
= O-675
(78)
W e have studied the structural properties of optimum DPCM schemesfor M th-order autoregressiveinputs. Necand that AR max= log,(l + m) = 1.133 b/sample. essary conditions for optimality of the quantizer in these schemes are developed and algorithmic approaches for 'We conjecture that R(D) in (74) is tight. If this conjecture is true, the quantity AR is truly representativeof the penalty due to zero-memory designing such quantizers are proposed. Also, difficulties inherent in computing the distribution of the prediction quantization of the sourcegiven by (72).
416
IEEE TRANSACTIONSONINFORMATIONTXEORY,VOL.IT-31,NO. 3,MAY1985
error are examined carefully. To overcomethese problems, we have developed series expansion methods matched to the source distribution. Our algorithmic procedures for optimum quantizer design are used in conjunction with the series expansion techniques to design optimum uniformthreshold entropy-constrained DPCM encoding schemes for first-order Gaussian and Laplacian autoregressive sources. It is shown that for Gaussian sourcesthere could be a wide gap between the optimum performance and the rate distortion function at low rates, when the source is highly correlated. Asymptotic results, similar to those developed by Gish and Pierce [29] in the memorylesscase, are developed in some detail. These results, which agree favorably with our numerical results in the Gaussian case, imply that at high rates there is only a 0.255-b/sample performance penalty. Corresponding asymptotic results for Laplacian sources demonstrate a wider gap between the optimum quantizer performance and the rate-distortion function lower bound. Unfortunately, at high correlation values our numerical results in the Laplacian case are not sufficient to demonstrate the validity of the predicted asymptotic performance. Because of the complexity of the quantizer design procedure at high rates when the number of levels is large, we were not able to provide performance results beyond N = 17. In all our numerical results we have restricted attention to uniform-threshold quantizers.Obviously, removal of this constraint can only improve the performance. We have decided to forego studying this issue, however, becauseof the complexity of more general optimum entropy-constrained quantizer design procedures. A logical extension of this research is to consider the case p = 1. Indeed, (1) with M = 1, p = 1 and Gaussian innovations represents the discrete-time version of the well-known Wiener process[6]. In this case the predictionerror density can be calculated explicitly, and hence there is no need for the series expansions. Another issue that remains to be thoroughly studied is a proof for the convergence of the algorithm as well as sufficient conditions for the quantizer optimality. Finally, we should note that we have consideredonly the first-order entropy of the quantizer output. At low rates, the prediction error is highly correlated and hence the entropy rate at the quantizer output m ight be noticeably lower than the first-order entropy. Computing or bounding this entropy rate is another issue that deservesstudy.
APPENDIX A RECURSIVEEQUATIONFORUPDATINGTHBHERMITE EXPANSIONCOEFFICIENTS
In this appendixwe presenta simplifiedversionof (38b)in the text. Also, formulasdescribingthe quantizeroutput entropy and averagedistortion in termsof the Hermiteexpansioncoefficients will be given. Let us define 4;(y) A jm Pwb - P(Y - q,(.d)lH,(x) --oo
dx,
I = 0,l;
...
(A.l)
Then, replacing pw(x) by g(x) and using (35c),it is easy to show that
F,(Y)= P’b - % m l’,
l=O,l;-..
(A.4
Thus (38b) of the text can be written as 4,,=$jm
g(y)Hki,(y)[y-q,(y)l,‘d~, . -03
k,l=O,l>.... (A.3)
Again, using (35~)and integratingby parts yields the following result:
&y A I,k
=
fjTY, 1(x- ei)‘-kd4dx,
r
N-l
L
i=l
PlIl- C
(Qi+l
I>k
I=k>l
N-l P’ c i=l
(Q,+I
-
Qi (A.4a)
with A 0,o = 1 A O,k = 0,
(A.4b)
(A.4c)
k 2 1.
Here, as in [5], we haveassumedthat qh (x) is zeroeverywhere except where x - qN( x) is zero, in which case qj,,(x) is +tr impulse whose weight is equal to the quantizerjump at that point. Using (61) and the facts that u$ = 1 and ui = 2a, + 1 (cf. [5]), implies D4”.
(A.5)
P’
The quantizeroutput entropyH is givenby (25),whereP, can be expressedas
4 = jT;lP,(x)dx
ACKNOWLEDGMENT
The authors would like to express their appreciation to Professor E. Masry of the University of California, San Diego, for helpful comments in the early stages of this But T work. They would like also to acknowledgethe comments sq-1gx( of the reviewers which helped to improve an earlier version of this paper.
= k~oakjT~~g(x)H,(X)dx. (A.61 Hk(x)
dx = g(T,-,)H,-,(T,-,)
-g(T,)Hk-l(T,),
k>l.
(A.7)
FARVARDIN
AND
MODESTINO:
DPCM
FOR AUTOREGRESSIVE
417
SOURCES
and thus using (A.5) implies D = 4th
2P,
-
+
1)
-
2(1
-
P’) (B.8)
P”
Similar to (A.6), we can express P, in terms of the Laguerre expansion coefficients by I =
1,2,...,
(A.s)
which, together with (25), leads to a formula for Ho in terms of the expansion coefficients. APPENDIX B RECURSIVE EQUATION FOR UPDATING THE LAGIJERRE EXPANSION COEFFICIENTS
(B.9a) while for f = (N + 1)/2
In this appendix we present a simplification of (46b). Corresponding formulas for the output entropy and average distortion are also derived in terms of the Laguerre expansion coefficients. Upon defining
p(N+
1)/2
=
2 E Pk/qN+1)‘21( x) Lk( x) dx. k=O
But, /T~j(x)lk(x)dx=&
G,(Y)
p im{
PI&
+P&
[~(xk~tx),ll"
- P(Y - q,(y))] + P(Y - q,v(~))l)WW~
=
(B.l)
'2' j=o
(-l)kp'-j[Xk-je-X] (k-j)!
(B.104
'
F-1
(B.lOb)
we can write (46b) as l,k=
B,,k =/mGt(y)W&)4, 0
(B.2)
RI,..,.
Equation (B.lOb), together with (B.9) and (25), describes the quantizer output entropy in terms of the expansion coefficients.
Noting that p ,+,( -) in (B.l) is given by (41b) and using (42c), the following simplified version of (B.l) can be obtained. = P’ +(l
, T-1
y r. 0, 12 0,
G,(y)
(B.9b)
o
- p2)2’-‘[l
- exp{ -0
+ i ci,rlPlilY- h4Y)li,
VI
- q.dY)I)] 12 1, y 2 0,
REFERENCES
(B.3a)
L2]
i=l
where
[31
ci,t~(-f)lj?21:[(j-:j+~(~)]+P2(~)},
[4]
i = 1,2;.
. ,I.
(B.3b)
Also
~51
[61 GO(Y) = 1
(B.4)
Equations (B.3) and (B.4) are used to facilitate the computation of B,, k expressed by the double integration in (46b). Note that
L,(x)
= 1,
@5a)
L,(x)
= 1 - x,
(B.5b)
and L*(x)
= 1 - 2X + $2.
(B.5c)
Therefore
u; = 4( p2 - 2p, + 1).
WI
On the other hand, uk=2(1-p*>,
(B.7)
J. B. O ’Neal, Jr., “Signal to quantization noise ratio for differential PCM,” IEEE Trans. Commun. Technol., vol. COM-19. DD. __ 568-569. Aug. 1971. --, “Delta-modulation quantizing noise-analytic and computer simulation results for Gaussian and television inuut signals.” Bell Syst. Tech. J., vol. 45, pp. 117-141, Jan. 1966. E. N. Protonotarios, “Slope overload noise in differential pulse code modulation systems,”Bell Cyst. Tech. J., vol. 46, pp. 2119-2161, Nov. 1967. T. L. Fine, “The responseof a particular nonlinear system with feedback to each of two random processes,”IEEE Trans. Inform. Theory, vol. IT-14, pp. 255-264, Mar. 1968. D. S. Arnstein, “Quantization errors in predictive coders,” IEEE Trans. Commun., vol. COM-23, pp. 423-429, Apr. 1975. A. Hayashi, “Differential pulse code modulation of the Weiner process,”IEEE Trans. Commun., vol. COM-26, pp. 881-887, June 1978.
“Differential pulse code modulation of stationary Gaussian [71 -, inputs,” IEEE Trans. Commun., vol. COM-26, pp. 1137-1147, Aug. 1978. PI E. Janardhanan, “Differential PCM systems,”IEEE Trans. Commun., vol. COM-27, pp. 82-93, Jan. 1979. L91 V. R. Algazi and J. T. DeWitte, Jr., “Theoretical performance of entropy-encoded DPCM,” IEEE Trans. Commun., vol. COM-30, pp. 1088-1095, May 1982. W I R. B. Ash, Real Analysis and Probability. New York: Academic, 1972. [11] E. Masry and S. Cambanis, “Delta modulation of the Weiner process,” IEEE Trans. Commun., vol. COM-23, pp. 1297-1300, Nov. 1975. PI N. Farvardin, “Rate-distortion performance of optimum entropyconstrained quantizers,”Ph.D. thesis Elec. Comput., and Syst. Eng. Dept., RensselaerPolytechnic Inst., Troy, NY, Dec. 1983. [I31 N. Farvardin and J. W. Modestino, “Optimum quantizer performance for a class of non-gaussian memoryless sources,” IEEE Truns. Inform. Theov, vol. IT-30, pp. 485-497, May 1984.
418 P41 P51
WI u71
WI 1191
PO1 WI WI [231 ~41 ~251
WI ~271
IEEE TRANSACTIONS
A. Gersho, “Stochastic stability of delta modulation,” Bell Syst. Tech. J., vol. 51, pp. 821-842, Apr. 1972. J. L. Doob, “Asymptotic properties of Markov transition probabilities,” Trans. Amer. Math. Sot., vol. 63, pp. 393-421, May 1948. L. H. Goldstein and B. Liu, “Deterministic stochastic stability of adaptive differential pulse code modulation,” IEEE Trans. Inform. Theory, vol. IT-23, pp. 445-453, July 1977. J. C. Kieffer. “Stochastic stabihtv for feedback auantization schemes,” ZEBE Trans. Znform. Theory, vol. IT-28, pp. 248-254, Mar. 1982. H. Hochstadt, Integral Equations. New York: Wiley, 1973. L. D. Davisson, “The theoretical analysis of data compression systems,” in Proc. IEEE, vol. 56, Feb. 1968, pp. 176-186. “Information rates for data compression,” IEEE WESCON Tech: Papers, SessionB, Aug. 1968. D. Slepian, “On delta modulation,” Bell. Cyst. Tech. J., vol. 51, pp. 2101-2137, Dec. 1972. J. B. Thomas, An Introduction to Statistical Communication Theory. New York: Wiley, 1969. M. D. Paez and T. H. Glisson, “Minimum mean-squared-error quantization in speech PCM and DPCM systems,” IEEE Trans. Commun., vol. COM-20. DD. 225-230. AD. 1972. N. N. Lebedev, Special*kmctions and-Their Applications. New York: Dover, 1972. R. M. Gray and Y. Linde, “Vector quantizers and predictive quantizers for Gauss-Markov sources,” IEEE Trans. Commun., vol. COM-30, pp. 381-389, Feb. 1982. D. G. Messerschmitt, “Quantizing for maximum output entropy,” IEEE Trans. Inform. Theory, vol. IT-17, p. 612, Sept. 1971. R. G. Gallager, Information Theory and Reliable Communication. New York: Wiley, 1968.
ON INFORMATION
THEORY,
VOL.
IT-X,
NO.
3,
MAY
1985
T. Berger, Rate-Distortion Theory. Englewood Cliffs, NJ: Prentice-Hall, 1971. ~291 H. Gish and J. N. Pierce, “Asymptotically efficient quantizing,” IEEE Trans. Inform. Theory, vol. IT-14, pp. 676-683, Sept. 1968. [301 D. L. Neuhoff and R. K. Gilbert, “Causal source codes,” ZEEE Trans. Inform. Theory, vol. IT-28, pp. 701-713, Sept. 1982. 1311 W. D. Hopkins, “Structural properties of rate distortion functions,” Ph.D. thesis, Syst. Sci. Dept., Univ. of California, Los Angeles, 1972. [321 R. M. Gray, “Information rates of autoregressiveprocesses,”IEEE Trans. Inform. Theory, vol. IT-16, pp. 412-421, July 1970. 1331 J. B. O’Neal, Jr., “Differential pulse-code modulation (PCM) with entropy coding,” IEEE Trans. Znform. Theory, vol. IT-22, pp. 169-174, Mar. 1976. [341 M. J. Carter and D. L. Neuhoff, “Bounds to the rate-distortion function for regenerative composite sources,”in Proc. 20th Allerton Conf. Commun., Control and Comuut.. Monticello, IL, Oct. 1982, _ pp. 680-681. 1351 R. E. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Trans. Znform. Theory, vol. IT-18, pp. 460-473, July 1973. [361 J. W. Modestino, “Nonparametric and adaptive detection of dependent data,” Ph.D. thesis, Electr. Eng. Dept., Princeton Univ.,
WI
Princeton, NJ, Apr. 1969. [371
E. Abaya and G. L. Wise, “Some notes on optimal quantization,” in Proc. IEEE Znt. Conf. Commun., Denver, CO, June 1981, pp.
30.7-30.7.5. 1381 J. D. Gibson and T. R. Fischer, “Alphabet-constrained data compression,” ZEEE Trans. Inform. Theory, vol. IT-28, pp. 443-456,
May 1982.