Fixed-Length Lossy Compression in the Finite ... - Princeton University

0 downloads 0 Views 393KB Size Report
Abstract—This paper studies the minimum achievable source coding rate as a function of blocklength n and tolerable distortion level d. Tight general ...
2011 IEEE International Symposium on Information Theory Proceedings

Fixed-length lossy compression in the finite blocklength regime: discrete memoryless sources Victoria Kostina

Sergio Verd´u

Dept. of Electrical Engineering Princeton University NJ, 08544, USA [email protected]

Dept. of Electrical Engineering Princeton University NJ, 08544, USA [email protected]

Abstract—This paper studies the minimum achievable source coding rate as a function of blocklength n and tolerable distortion level d. Tight general achievability and converse bounds are derived that hold at arbitrary fixed blocklength. For stationary memoryless sources with separable distortion, the minimum rate achievable is shown to be closely approximated by R(d) + q V (d) −1 Q (!), where R(d) is the rate-distortion function, V (d) is n the rate dispersion, a characteristic of the source which measures its stochastic variability, Q−1 (·) is the inverse of the standard Gaussian complementary cdf, and ! is the probability that the distortion exceeds d. The new bounds and the second-order approximation of the minimum achievable rate are evaluated for the discrete memoryless source with symbol error rate distortion. In this case, the second-order approximation reduces to R(d) + 21 logn n if the source is non-redundant. Index Terms—Shannon theory, lossy source coding, rate distortion, memoryless sources, finite blocklength regime, achievability, converse.

I. I NTRODUCTION The rate-distortion function characterizes the minimal source coding rate compatible with a given distortion level, either in average or excess distortion sense, provided that the blocklength is permitted to grow without limit. However, in some applications relatively short blocklengths are common both due to delay constraints and the coding complexity which increases with blocklength. It is therefore of critical practical interest to assess the unavoidable backoff from the rate-distortion function required to sustain the desired fidelity at a given fixed blocklength. Neither the coding theorem nor the reliability function, which gives the asymptotic exponential decay of the probability of exceeding a given distortion level when encoding at a fixed rate, provide an answer to that question. This paper presents a new achievability upper bound and a new converse lower bound to the minimum rate sustainable as a function of blocklength and excess probability, valid for general sources and general distortion measures. In addition, for stationary memoryless sources with separable distortion, This work was partially supported by NSF under grant CCF-1016625 and by the Center for Science of Information, an NSF Science and Technology Center, under grant agreement CCF-0939370. The first author was supported in part by the Natural Sciences and Engineering Research Council of Canada.

978-1-4577-0594-6/11/$26.00 ©2011 IEEE

we show that the finite blocklength coding rate is well approximated by ! V (d) −1 R(n, d, !) ≈ R(d) + Q (!) , (1) n where n is the blocklength, d is the distortion threshold, ! is the probability of distortion exceeding d, and V (d) is the rate dispersion. The new bounds are particularized for the stationary discrete memoryless source with symbol error rate distortion (DMS). In the equiprobable DMS case, the ratedispersion function turns to zero, and the finite blocklength coding rate is well approximated by R(n, d, !) ≈ R(d) +

1 log n 2 n

(2)

Section II sets up the problem and introduces a few definitions. Section III reviews the few existing finite blocklength achievability and converse bounds for lossy compression. Section IV shows the new general upper and lower bounds to the minimum rate at a given blocklength. Second-order asymptotic analysis is given in Section V. Section VI focuses on the DMS. II. N ONASYMPTOTIC RATE DISTORTION

THEORY

In the standard model of fixed-to-fixed (block) compression, the output of a general source with alphabet A and source distribution PX is mapped to one of the M codewords from the reproduction alphabet B. A lossy code is a pair of mappings f : A "→ {1, . . . , M } and c : {1, . . . , M } "→ B. A distortion measure d : A × B "→ R+ is used to quantify the performance of a lossy code. Given decoder c, the best encoder simply maps the source output to the closest (in the sense of the distortion measure) codeword, i.e. f(x) = arg minm d(x, c(m)). The average distortion over the source statistics is a popular performance criterion. A stronger criterion is also used, namely, the probability of exceeding a given distortion level (called excess-distortion probability). The following definitions abide by the excess distortion criterion. Definition 1. An (M, d, !) code for {A, B, PX , d : A × B "→ R+ } is a code with |f| = M such that P [d (X, c(f(X))) > d] ≤ !.

139

If A and B are the n−fold Cartesian products of alphabets A and B, an (M, d, !) code for {An , B n , PX n , dn : An × B n "→ R+ } is called an (n, M, d, !) code. Definition 2. Fix !, d and blocklength n. The minimum achievable code size and the finite blocklength rate-distortion function (excess distortion) are defined by, respectively, M ! (n, d, !) = min {M : ∃(n, M, d, !) code} 1 R(n, d, !) = log M ! (n, d, !) n III. P RIOR WORK

(3) (4)

Returning to the general setup of Definition 1, the basic general achievability result can be distilled from Shannon’s coding theorem for memoryless sources: Theorem 1 (Achievability, [1], [2]). Fix PX , a positive integer M and d ≥ 0. There exists an (M, d, !) code such that ! ≤ inf P [d (X, Y ) > d] (5) PY |X " # + inf P [ıX;Y (X; Y ) > log M − γ] + e− exp(γ) γ>0

where

ıX;Y (x; y) = log

dPXY (x, y) d(PX × PY )

(6)

denotes the information density of the joint distribution PXY at (x, y) ∈ A × B. Theorem 1 is the most general achievability result known. For three related scenarios, we can cite the achievability bounds of Goblick [3], Pinkston [4] and Sakrison [5]: DMS with finite alphabet and separable distortion measure [3]; variable-rate compression of DMS with finite alphabet and separable distortion measure [4]; variable-rate quantization of Gaussian i.i.d. source with mean-square error distortion [5]. As for the existing finite blocklength converse results, for DMS with finite alphabet with bounded separable distortion measure, a finite blocklength converse can be distilled from Marton’s paper on the error exponent [6]. However, it turns out that it results in rather loose lower bounds on R(n, d, !) unless n is very large, in which case the rate-distortion function already gives a tight bound. For variable-rate quantization, strong achievability and converse bounds can be obtained from the “lossy AEP” [7]. Second-order refinements of the lossy AEP were studied in [8], which also presents a nonasymptotic converse for variable-rate lossy compression that parallels Barron’s converse for lossless compression. Considerable attention has been paid to the asymptotic behavior of the redundancy, i.e. the difference between the average distortion D(n, R) of the best n−dimensional quantizer and the distortion-rate function D(R). For finite-alphabet $ i.i.d. % sources, Pilc [9] showed that D(n, R) − D(R) ≤ O logn n . Zhang, Yang and Wei [10] refined the work of Pilc and showed that for memoryless sources$with %finite alphabet, D(n, R) − log n log n . For stationary Gaussian D(R) = − ∂D(R) ∂R 2n + o n sources with mean-square error distortion, using a variant of

the classical achievability result in((5), Wyner [11] showed that &' log n n

D(n, R) − D(R) ≤ O

. Linder, Lugosi and Zeger

[12] analyzed ( using Hoeffding’s inequality and established &' (5) log n n

the O

convergence rate for bounded i.i.d. sources

with mean-square error distortion, thereby extending Wyner’s result to non-Gaussian sources. In [13], Linder and Zeger were able to drop(the bounded support requirement and showed the &' log n upper bound for memoryless real sources with O n

finite sixth moment. The main step of the proof in [13] is the application of the Berry-Esseen central limit theorem (CLT) to (5). Note that the results of [11]–[13] rely on (5), which, as we will show, is far from tight in the finite blocklength regime. On the other hand, as the average overhead over the distortion-rate function is dwarfed by its standard deviation, the more accurate analyses of [9], [10] are likely to be overly optimistic since they neglect the stochastic variability of the distortion. IV. N EW BOUNDS

In this section we give an achievability result and a converse result for any source and any distortion measure. When we apply these results in Sections V and VI, the source becomes an n−tuple (X1 , . . . , Xn ). Theorem 2 (Achievability). There exists an (M, d, !) code with ! ≤ inf E (E [1 {d(X, Y ) > d} |X])M (7) PY

where the infimization is over all random variables defined on B, independent of X. Proof: Let the codebook be (c1 , . . . , cM ). Upon seeing the source output x, the optimum encoder chooses arbitrarily among the members of the set arg min d(x, ci ) i=1,...,M

(8)

The indicator function of the event that the distortion exceeds d is ) * + M 1 {d(x, ci ) > d} (9) 1 min d(x, ci ) > d = i=1,...,M

i=1

Suppose that (c1 , . . . , cM ) are drawn independently from PY . Averaging over both the input and the choice of codewords, we get E

"M Y

i=1

=E

M Y

## " "M Y 1 {d(X, Yi ) > d} |X 1 {d(X, Yi ) > d} = E E #

i=1

E [1 {d(X, Yi ) > d} |X] = E (E [1 {d(X, Y ) > d} |X])M

i=1

where we have used the fact that Y1 , . . . , YM are independent even when conditioned on X. Since there must exist at least one code with excess-distortion probability lower than the average over the code ensemble, the existence of a code satisfying (7) follows.

140

While the right side of (7) gives the exact performance of random coding, Shannon’s random coding bound (Theorem 1) was obtained by upper bounding the performance of random coding. As a consequence, the result in Theorem 2 is tighter than Shannon’s random coding bound, but it is also harder to compute. Our converse result is based on binary hypothesis testing. The optimal performance achievable among all randomized tests PZ|X : A → {0, 1} between probability distributions P and Q on A is denoted by (1 indicates that the test chooses P ) 1: βα (P, Q) = min Q [Z = 1] (10) PZ|X : P[Z=1]≥α

Theorem 3 (Converse). Let PX be the source distribution. Any (M, d, !) code must satisfy β1−% (P, Q) Q [d(X, y) ≤ d]

M ≥ sup inf

Q y∈B

(11)

where the supremum is over all distributions on A. Proof: Let (f, c) be an (M, d, !) code. Write β1−% (PX , Q) =

min

PZ|X :P[Z=1]≥1−%

Q [Z = 1]

≤ Q [d(X, c(f(X))) ≤ d] ≤

M ,

(12)

Fix d, 0 < ! < 1, η > 0. Suppose the target is to sustain a probability of exceeding distortion d bounded by ! at rate R = (1 + η)R(d). As (1) implies, the required blocklength scales linearly with rate dispersion: & (2 V (d) Q−1 (!) (17) n(d, η, !) ≈ 2 R (d) η

where note that only the first factor depends on the source, while the second depends only on the design specifications. The proof of the following result (see [15]) relies on our new bounds as well as the results on the asymptotic behavior of distortion d-balls developed in [7], [8], [10].

Theorem 4 (Second-order approximation). Fix stationary memoryless source {Xi } with alphabet A and separable distortion measure. Under the following technical conditions, (i) For all x ∈ A with positive probability, miny∈B d(x, y) = 0, and the acceptable distortion level satisfies 0 < d < d0 , where d0 = min{d : R(d) = 0}. (ii) The random variable log E [exp{λ! d(X, Y)}|X] has finite third moment, where Y is independent of X, and its ! ! distribution is the marginal of PX PY|X , where PY|X achieves

(13)

Q [d (X, c(m)) ≤ d]

! I(PX , PY|X ) = R(d) =

(14) (15)

d=

y∈B

where (13) is due to the fact that Z = 1 {d(X, c(f(X))) ≤ d} defines a (not necessarily optimal) hypothesis test between PX and Q with P [Z = 1] ≥ 1 − !. Suppose for a moment that X has finite alphabet, and let us further lower bound (11) by taking Q to be the equiprobable distribution on A, Q = U . Let us consider the set Ω ⊂ A that has total probability 1 − ! and contains the most probable source sequences, i.e. for any source sequence x in Ω, there is no sequence outside of Ω having probability greater than PX (x). For any x ∈ Ω, the optimum binary hypothesis test between PX and Q must choose PX . Thus the numerator of (11) evaluated with Q = U is proportional to the number of elements in Ω, while the denominator is proportional to the number of elements in a distortion ball of radius d. Therefore (11) evaluated with Q = U yields a lower bound to the minimum number of d-balls required to cover Ω. V. S ECOND - ORDER ANALYSIS In the spirit of [14], we introduce the following definition. Definition 3 (rate dispersion). Fix d ≥ 0. The rate-dispersion function (squared information units per source symbol) is defined as 2

V (d) = lim lim sup %→0 n→∞

n (R(n, d, !) − R(d)) 2 ln 1%

(16)

1 Throughout, P , Q denote distributions, whereas P, Q are used for probabilities of events on the underlying probability space.

I(X; Y),

(18)

and λ! < 0 is the unique solution to

m=1

≤ M sup Q [d(X, y) ≤ d]

min

PY|X : E[d(X,Y)]≤d

d E [log E [exp{λd(X, Y)}|X]] dλ

(19)

it holds that !

( & V (d) −1 log n R(n, d, !) = R(d) + Q (!) + θ n n . 1 V (d) = Var log E [exp{λ! [d(X, Y) − d]}|X]

(20) (21)

where Y, λ! are as in (ii), and the remainder term in (20) satisfies −

log n +O n

„ « „ « „ « 1 log n 1 log n log log n ≤θ ≤ +O (22) n n 2 n n

Note that the rate-distortion achieving random variable Y is unique [2], so there is no ambiguity in (21). Moreover, if R(d) is differentiable, λ! is equal to the slope of R(d) at d, λ! = R' (d)

(23)

The equivalence of characterizations (19) and (23) was first noticed by Berger [16]. Since the rate-distortion function can be expressed as [16] . 1 (24) R(d) = E log E [exp{λ! [d(X, Y) − d]}|X] the rate-distortion function is equal to the expectation of the random variable whose variance we take in (21), thereby drawing a pleasing parallel with the channel coding results in [14].

141

VI. D ISCRETE

MEMORYLESS SOURCE

This section particularizes the bounds in Section IV to the stationary m−ary memoryless source with Hamming distortion measure. For convenience, we denote the number of strings within Hamming distance k from a given string by k & ( , n n (m − 1)j (25) Sk = j j=0

rate distortion measure. Fix n and 0 ≤ d < (m − 1)PX (m). There exists an (n, M, d, !) fixed composition code with , &n( M pt (1 − ρ(n, t, q)) (31) !≤ t t

t

The following achievability result follows from Theorem 2 by letting PX and PY in (7) be equiprobable on An . Theorem 5 (Achievability, EDMS). There exists an (n, M, d, !) code for the m−ary equiprobable source with symbol error rate distortion measure such that %M $ n ! ≤ 1 − S(nd) m−n (26)

Theorem 6 (Converse, EDMS). For the m−ary equiprobable source with symbol error rate distortion measure, any (n, M, d, !) code must satisfy: (27)

An asymptotic analysis of the bounds in (26) and (27) yields the following strengthening of the approximation offered by Theorem 4 Theorem 7 (Second order, EDMS). For the stationary memoryless m−ary equiprobable source with symbol error rate distortion measure, the minimum achievable rate at blocklength n satisfies & ( 1 log n 1 +O (28) R(n, d, !) = R(d) + 2 n n R(d) = log m − h(d) − d log(m − 1) (29) as long as 0 ≤ d
η} 1 PX (a) a ≤ mη Zη (a) = η otherwise

(43) (44)

and the remainder in (39) satisfies (22). In particular, if 0 ≤ d < (m − 1)PX (m), R(d) = H(X) − h(d) − d log(m − 1) m , pj log2 pj − H 2 (X) V (d) = Var [ıX (X)] =

(45) (46)

j=1

and the remainder term in (39) satisfies the following stronger condition than that in (22): & ( ( & & ( log n 1 log n log log n 1 ≤θ ≤ +O O n n 2 n n A numerical comparison of Shannon’s achievability bound, the new bounds for DMS and the second-order approximation in Theorem 10 is presented in Fig. 1. 1 0.95 0.9 0.85

R

0.8

Shannon’s achievability (5)

0.75 0.7 0.65

Achievability (31)

0.6

Approximation (39) Converse (36)

0.55 0.5

R(d) 0

100

200

300

400

500

n

600

700

that are tighter than existing bounds. For stationary memoryless sources with separable distortion, the rate dispersion (along with the rate-distortion function) serves to give tight approximations to the fundamental fidelity-rate tradeoff unless the blocklength is small. The bounds and the second-order approximation are refined in the case of discrete memoryless sources with symbol error rate distortion. The application of our approach to the finite-blocklength analysis of Gaussian sources with mean-square error distortion is included in the extended version [15]. Furthermore, corresponding results for average distortion can be obtained from our excess distortion bounds.

800

900

1000

Fig. 1. Bounds to R(n, d, !) for the binary memoryless source with PX (0) = 2/5, d = 0.11 , ! = 10−2 .

VII. C ONCLUSION

ACKNOWLEDGEMENT Useful discussions with Dr. Yury Polyanskiy are gratefully acknowledged. In particular, Theorem 3 arose from discussions with him. R EFERENCES [1] C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,” IRE Int. Conv. Rec., vol. 7, pp. 142–163, Mar. 1959. [2] S. Verd´u, “ELE528: Information theory lecture notes,” Princeton University, 2009. [3] T. Goblick Jr., Coding for a discrete information source with a distortion measure. PhD thesis, M.I.T., 1962. [4] J. Pinkston, Encoding independent sample information sources. PhD thesis, M.I.T., 1967. [5] D. Sakrison, “A geometric treatment of the source encoding of a Gaussian random variable,” IEEE Transactions on Information Theory, vol. 14, pp. 481–486, May 1968. [6] K. Marton, “Error exponent for source coding with a fidelity criterion,” IEEE Transactions on Information Theory, vol. 20, pp. 197–199, Mar. 1974. [7] A. Dembo and I. Kontoyiannis, “The asymptotics of waiting times between stationary processes, allowing distortion,” Annals of Applied Probability, vol. 9, pp. 413–429, May 1999. [8] I. Kontoyiannis, “Pointwise redundancy in lossy data compression and universal lossy data compression,” Information Theory, IEEE Transactions on, vol. 46, pp. 136–152, Jan. 2000. [9] R. Pilc, Coding theorems for a discrete source channel pairs. PhD thesis, M.I.T., 1967. [10] Z. Zhang, E. Yang, and V. Wei, “The redundancy of source coding with a fidelity criterion,” IEEE Transactions on Information Theory, vol. 43, pp. 71–91, Jan. 1997. [11] A. D. Wyner, “Communication of analog data from a Gaussian source over a noisy channel,” Bell Syst. Tech. J, vol. 47, pp. 801–812, May/June 1968. [12] T. Linder, G. Lugosi, and K. Zeger, “Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding,” IEEE Transactions on Information Theory, vol. 40, pp. 1728–1740, Nov. 1994. [13] T. Linder and K. Zeger, “On the cost of finite block length in quantizing unbounded memoryless sources,” IEEE Transactions on Information Theory, vol. 42, pp. 480–487, Mar. 1996. [14] Y. Polyanskiy, H. Poor, and S. Verd´u, “Channel coding rate in finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, pp. 2307–2359, May 2010. [15] V. Kostina and S. Verd´u, “Fixed-length lossy compression in the finite blocklength regime,” Arxiv preprint arXiv:1102.3944, Feb. 2011. [16] T. Berger, Rate distortion theory. Prentice-Hall Englewood Cliffs, NJ., 1971. [17] W. Szpankowski and S. Verd´u, “Minimum expected length of fixedto-variable lossless compression without prefix constraints: memoryless sources,” to appear, IEEE Transactions on Information Theory, 2010.

To estimate the minimum rate required to sustain a given fidelity at a given blocklength, we showed a new achievability and a new converse bound that apply in full generality and

143

Suggest Documents