Exponential Rate of Convergence for Lloyd's Method I - IEEE Xplore

2 downloads 0 Views 901KB Size Report
Exponential Rate of Convergence for. Lloyd's Method I. JOHN C. KIEFFER. A &tract -For a random variable with finite second moment and log- concave density ...
IEEE TRANSACTIONS

ON INFORMATION

THEORY,

VOL.

205

IT-28, NO. 2, MARCH 1982

REFERENCES

[II R. L. Dobrushin, “Prescribing a system of random variables by conditional distributions,” Theory of Probability and its Applications, __ vol. 15, 458-486, 1970.

PI R. M. Dudlev. “Distances of urobabilitv measures and random

variables,” Annals of Mathemat&al Statisiics, vol. 39, 1563- 1572, 1968. theorems for [31 J. Elker, D. Pollard, and W . Stute, “Glivenko-Cantelli classes of convex sets,” Advances in Applied Probability, vol. 11, 820-833, 1979. 141 P. Gaenssler and W . Stute, “Empirical processes: A survey of results for independent and identically distributed random variables,” Annals of Probability, vol. 7, 193-243, 1979. [51 R. M. Gray,D. L. Neuhoff, and P. C. Shields, “A generalization of Omstein’s d distance with applications to information theory,” Annals of Probability; vol. 3, 315-328, 1975. [61 J. A. Hartigan, “Asymptotic distributions for clustering criteria,”

Annals of Statistics, vol. 6, 117-13 I, 1978. 171 Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Transactions on Communications, vol. 18, 84-95, 1980. PI S. P. Lloyd, “Least squares quantization in PCM,” IEEE Trans. ‘Inform. fheo. * . ,yN and real numbers x, < . . . < xN-, in (u, 7) such that

ET u, r be extended real numbers with u < 7. Let p: (u, 7) -+ (0, co) be a log-concave probability density with /,‘x*p(x) dx < co. Let N be a fixed positive integer, greater than one. W e call a m a p Q : (u, 7) -+ (- co, co) an N-level quantizer, if and only if there are real numbers Manuscript received March IO, 198 1; revised October 6, 1981. This work was supported by the National Science Foundation under Grant ECS-7821335: The author is with the Department of Mathematics and Statistics, University of Missouri-Rolla, M O 65401.

i = l;**,N

>

where we take x0 = u, x - 7. (We adopt this convention . from now on* that is if z,~~(u I,’ . . , uN-, ) is a sequenceof (N - 1) real’numbeis, it will be understood that in addition there is a u0 defined to be u and a uN defined to be r.) It is well-known [2] [9] that if X is a (a, r)-valued random variable with density p, there is a unique N-level quantizer Q* for which E{(X-

L

xi-, 5 y < xi,

Q(Y) =Y,,

Q*(X))‘}

-{(X-

Q(X))‘},

for every N-level quantizer Q . Thus, if one wishes to encode X with an N-level quantizer, the best encoder, in the senseof squared-error distortion, is Q*. Lloyd’s Method I [6] is a way of finding Q*. It involves applying a certain iterative procedure to an initial quantizer Q ,, obtaining a sequence of quantizers Q ,, Q , , Q2,. . -3 which are successiveapproximations to Q*. In the lim it, Q , + Q*. W e now describe this method. Let R denote the real line and let 0, be the set of all < . .. 0 and /3(0 < /3 < 1) such that Ipti,-E{(X-Q,(X))‘})SC/~“,

(2) Let C = Proof: Now pmi, = E{(X - Q*(X))‘}. sup,(pzi + [E( X - Q,(X))*]‘/*). Then the left side of (2) is upper-bounded by C(pbj,,z-[E(X-

Q,(X))“]“*/

5 @(Q*(X) We call a function, f: R -+ R, N-piecewise affine if R may be partitioned into N subintervals on each of which f is affine. We call x E 0, a fixed point of T if TX = x. Trushkin [9] showed that T has a unique fixed point, which we will denote by u*. This fixed point yields the optimum N-level quantizer, i.e., Q,* = Q*. The map T: 0, + 0, is differentiable and so at each x E O,, we may let DT(x) denote the (N - 1) X (N - 1) derivative matrix of T at x. As will be shown later, DT(x) is a nonnegative matrix whose row sums are all less than or equal to one. We state now our main result, which says that the successiveapproximations given by Lloyd’s Method I converge exponentially fast to the optimum quantizer if DT(u*) is not a stochastic matrix. (Recall that a matrix is defined to be stochastic if each row is a probability vector. Also if {xn} is a sequence of real numbers converging to the real number x, we say the convergence takes place exponentially fast if for some cll(O< (Y< 1) ] x, - x )< (Y” for n sufficiently large; if {x’“)} is a sequence in RN-’ converging to x E RN-‘, we say the convergence takes place exponentially fast if [IX(~) - x II + 0 exponentially fast; finally, we say that the sequence of N-level quantizers {Q,} converges exponentially fast to the N-level quantizer Q if E[Q,(X) - Q(X)]* --) 0 exponentially fast, where X is a (u, r)-valued random variable with density p.) Theorem 1: Suppose DT(u*) is not stochastic. Then for any x E O,, T”x + U* and QrUx + Q* exponentially fast. Moreover, if h is the largest eigenvalue of DT(u*), then 0 < h < 1 and for any x E ON and h > h we have for n

n=O,l;...

- e,(X,)‘]“*,

from which (2) follows, since from Theorem 1, Q, -+ Q* exponentially fast. As a consequence of this corollary, if it is desired to choose for our encoder-an N-level quantizer which yields distortion agreeing with the optimum distortion in the first n decimal places, the number of times the function T in Lloyd’s Method has to be applied approaches infinity as n -+ cc no faster than linearly in n. In Section IV we will also obtain the following negative result. Theorem 2: Suppose DT(u*) is stochastic. If x E 0, and xi > UT (1 I i 5 N - 1) or xi < UT (1 5 i 5 N - l), then {T”x} does not approach u* exponentially fast and {QTnx} does not approach Q* exponentially fast.

For example, if (a, 7) = (-cc, cc), N = 2, and p(t) = (1/2)e-14 , - cc < t < co, then u* = 0 and T’(u*) = 1. Hence, for any x # 0, (T”x} does not approach 0 exponentially fast. III.

APPLICATION

If C is any subset of the real line R, let C” denote the set of all sequences (x,, x,, . . . ) from C. We put the product topology on Rm, and also regard Rm as a measurable space with the Bore1 sets. Following [5], if {Xi}:, {r;}r are real processes defined on a common probability space, we say the pair process {(Xi, yl)}pzO is stochastically stable if for every bounded function f: R” X R” + R such

KIEFFER:

RATE

OF CONVERGENCE

FOR LLOYD’S

METHOD

207

I

that f(x, .): R” --) R is continuous for every x E R” and f( *, y); R” + R is measurable for every y E R*, the sequence c I

0. Now for each i,

If&‘,@)

-f(x~,~~)IIW(XiM,d(~OO,~.m)).

-l

N-l i=O

(4

Also, d(@, qw) + 0 almost surely since q - Ui --) 0 almost surely. Hence

J

converges almost surely as N + cc, where XpO,ylm denote the R”-valued functions (Xi, X,, ,, * * * ), (& q+ ,, . . p ), respectively. Suppose {Xi}EO is a stationary process with state space (a, r), each Xi having density p. W e can encode this process into the quantized process {Q*(Xi)}& thereby achieving the m inimal N-level encoding error. The drawback to this is that it may not be possible to obtain Q* precisely; one may only be able to obtain close approximations to it. This suggests we try the following adaptive quantization procedure: run Lloyd’s algorithm obtaining a sequence of quantizers Q ,, Q ,, * * * ; encode X,,, X,, . . . into Q,, Q,, . . - . The following result shows that nothing is lost in the long run by doing this; the processes {Q*(4)> and {Qi> h ave the same asymptotic properties.

N-1

lim sup N- ’ N-CC N-l

z ~(&~,r) i=O

1

. (5)

By the ergodic theorem, the right side of (5) is zero almost surely. Applying (4), we see that the sequence {N-‘Z:i’f(Ty, @)} convergesalmost surely and that its almost sure lim it must be the right side of (3). IV.

PROOFS

Let g,, g,, be the partial-derivative functions

Theorem 3: Let {Xi}~“=, be a stationary process with state space (a, T), each entry having density p. Suppose OT(u*) is not stochastic. Let {Qi}T?=,be a sequence of Calculating these derivatives, we obtain N-level quantizers given by Lloyd’s Method I. Let {q}zO be the process such that y = Q*(Xj), for each i. Let g,h P> = z-G)(da, P> - d/~“p(t, dt a {q.}z=, be the process such that q = Q i(Xi), for each i. is stochastically stable. Furthermore, if Then, {(& q>};, (6) g,b, P> = P(P)@ - da, B))/[‘p(f) dt, f: R” X R” -+ R is a bounded function measurablein the first variable and continuous in the second variable, then from which formulas we see that the functions g,, g, are N-l nonnegative and continuous on their respective d o m a ins. ,li+mmN-’ 2 f(X,w,@) Using (6), F leischer [2, eqs. (16)-( 18)] proved the following i=O identity N-l =~F~N-’ 2 f(qm,qm) almost surely. (3) 2(1 - g,h P> - g2h PI> i=O Proof: q. - (i + 0 almost surely, since by Theorem 1, for every z > 0,

,zopr[l a - q]> e] 5 e-* 5 E[(o

- q~‘]

< 00.

i=O

Pick a compact subset C of R such that the range of & and a is contained in C for every i. Let d be the metric on C” such that if y = (yo, y,, . . . ) and y’ = (y& y;, . * * ) are in C”, then m

d(y, Y') = 2 2-'lyi

-yi'I

.

i=O

If x E R” and 6 > 0 define 4x,

6) = sup{ Ifb> Y) - f(X> Y’> I : Y, Y’ E C”, d(y, Y’) 5 a}.

where + is the nonnegative function defined almost everywhere on (a, 7) X (a, 7) by +(x9 Y> = b -YHPwP’(Y)

- P(Y)P’W

W e remark that for F leisher’s derivation in [2] to be valid, it is needed that p(t) and tp(t) are absolutely continuous (and hence almost everywhere differentiable) functions of t. This can be deduced from the log-concavity of p using [7, theorem A]. The log-concavity of p also tells us that (p is nonnegative almost everywhere,whence u 0 such that for every n,

max

sup

i=l:.,,N

I Q,&)

tEA;“)rlA,

- Q,*(t) I I

I M ,llT”x

- u*ll.

Now if ((Y,p) is an interval and {((Ye,Pk)}zZo are intervals with (Ye---)a, pk -+ p, then gccr,, &) + g(a, fi>. Thus the functions {Q,*} U {QrnZ: n .= 0,l; . a,}, are uniformly bounded. Therefore, we may pick M , > 0 so that sup n=O,l;”

sup I Q,& o u*, then

@I QT&>

- Q,*(X) I’]

IIx - u*II-‘(27, x - u*) I minui > 0, and then from (12) we have (if in addition IIx - u*ll I 6) that [(v, TX - u*)/ (0, x - u*)] 2 1 - /311x- u*ll, (13)

5 M,2llT”x - u*ll* + M2 ; m(@)AAi), i=l

where m denotes Lebesgue measure. It is easily checked that XfYZ,m(Ai(“)AAi) is no bigger than 2NIlT”x - u*ll. Substituting this upper bound in the preceding inequality, we see now that a) holds. Now we prove b). Set xcn) = T”x, n 2 1. Suppose QrnX + Q, * exponentially fast. Then for each i (1 I i I N) the sequence {( &a$!!),, xi”)) g( UT-, , ~f))~} approaches zero exponentially fast as n + 00. This implies that for each i (1 I i 5 N - 1) that + [ g( Xi’“‘,) xy ) + g( xp, xi(;),)] +S[&-,, 49 + Edu:> ~~+:+.,)I exponentially fast. But the left side of the preceding convergence statement is x!‘+‘) and the right side is ~1. Thus, T”x --) u* exponentiall; fast. Theorem 1 is a consequence of Lemma 1, and Lemmas 4-7. In the following, x > u* means xi > u: (1 I i I N 1) and x < u* means xi < UT (1 5 i 5 N - 1). is stochastic. If x E 0, and either x > u* or x < u*, then {T”x} does not approach u* exponentially fast.

where we have set j3 = C/minivi. Fix x(O) E 0, so that x(O)> u*. Set xcn) = T”x(‘), n 2 1. Then xc”) > u* and, hence, (v, x(‘) - u*) > 0, for all n. For n 2 k, say, we will have Ildn) - u*ll I 6, /.-I. Since x(i+‘) (v, XC”) - u*) = (v, x(O) - u*)ny:;[(v, u*)/(v, x(l) - u*)] we must have (from (13)) that jk(

limsupllu

- u*Il-‘IIDT(u)

- DT(u*)ll < co.

u+u*

Hence we may pick C > 0 and 6 > 0 such that /lTx - Tu* - DT(u*)(x

- u*)ll 5 C/lx - u*l12,

* Ilx - u*ll 4 6, and then 1 (v, TX - u*) - (v, x - u*) 15 Cllx - u*112,

.Ilx - u*ll 5 6, (12)

(14

Therefore, the sequence (xc”)} cannot converge to u* exponentially fast, for if so we would have ZE, I]x(~) - u*ll < co and then the infinite product on the left side of (14) would have a positive limit [S, statement (1.17), p. 2911. Similarly, one can show that {T”x} does not converge to u* exponentially fast if x < u*. Theorem 2 follows from Lemmas 7 and 8. REFERENCES

Lemma 8: Suppose DT(u*)

Proof. Since DT(u*) is a regular stochastic matrix we may pick a probability vector v E R”-’ with all positive components such that vDT(u*) = v, where in the preceding equation we write v as a row vector. Since log p is piecewise affine it is not hard to show that

1 - pllx(i) - 24*11)= 0.

[II

C. H. Edwards, Advanced Calculus of Severul Variables. New York: Academic, 1973. P. Fleischer, “Sufficient conditions for achieving minimum distortion ‘21 in a quantizer,” IEEE Innt. Conv. Rec., pp. 104- 111, 1964. [31 F. R. Gantmacher, The Theory of Matrices, vol. II, New York: Chelsea, 1964. [41 R. M. Gray, .I. C. Kieffer, and Y. Linde, “Locally optimal block quantizer design,” Information and Control, vol. 45, pp. 178- 198, May 1980. [51 J. C. Kieffer, “Stochastic stability of feedback quantization schemes,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 248-254, Mar. 1982, this issue. 161 S. P. Lloyd, “Least squares quantization in PCM,” Bell Telephone Labs Memorandum, Murray Hill, NJ, 1957; reprinted in IEEE Trans. Inform. Theory, vol. IT-28, pp. 129-137, Mar. 1982, this issue. [71 A. W. Roberts and D. E. Varberg, Convex Functions. New York: 1973. i8, Academic, . _ S. Saks and A. Zygmund, Analytic Functions. Warsaw: Polish Scientific Publishers, 1965. . [91 A. V. Trushkin. “Sufficient conditions for uniaueness of a locally optimal quantizer for a wide class of error weighting functions,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 187-198, Mar. 1982, this issue.