Abstract-A finite-state vector quantizer is a finite-state machine used for data compression: Each successive sonrce vector is encoded into a codeword using a ...
348
IEEE
TRANSACTIONS
ON INFORMATION
THEORY,
IT-31, NO. 3, MAY 1985
VOL.
Finite-State Vector Quantization for Waveform Coding JOHN FOSTER, MEMBER, IEEE, ROBERT M. GRAY, FELLOW, IEEE, AND MAR1 OSTENDORF
Abstract-A finite-state vector quantizer is a finite-state machine used for data compression: Each successive sonrce vector is encoded into a codeword using a minimum distortion rule, and into a code book, depending on the encoder state. The current state and the selected codeword then determine the next encoder state. A finite-state vector quantizer is capable of making better use of the memory in a source than is an ordinary memoryless vector quantizer of the same dimension or blocklength. Design techniques are introduced for finite-state vector quantizers that combine ad hoc algorithms with an algorithm for the design of memoryless vector quantizers. Finite-state vector quantizers are designed and simulated for Gauss-Markov sources and sampled speech data, and the resulting performance and storage requirements are compared with ordinary memoryless vector quantization.
I.
INTR~DUOTI~N
A
FINITE-STATE vector quantizer (FSVQ) is an example of a finite-state machine for data compression. It can be viewed as a finite collection of ordinary vector quantizers (VQ’s) or block source codes, where each successive source vector is encoded using a VQ determined by the current encoder state. The current state and the codeword selected then determine the next encoder state. Unlike an ordinary memoryless VQ, an FSVQ can take advantage of the memory or correlation between successive source vectors by choosing the appropriate code or VQ, given the past behavior. Finite-state machines for vector quantization are not new: ordinary VQ’s and sliding-block or sliding-window codes are special cases of finite-state machines. Trellis encoders are also examples of finite-state machines where the encoder is permitted to look ahead of the current source vector. More general finite-state machine models were considered by Ziv [l] and Gaardner and Slepian [2]. Ziv focused on rate-distortion theory for such codes, and Gaarder and Slepian developed several optimality properties and some coding theorems for such codes. Gaarder Manuscript received June 27, 1983; revised June 18, 1984. This work was supported in part by a Bell Laboratories Cooperative Research Fellowship, the National Science Foundation, the U.S. Army Research Office, and the Joint Services Electronics Program at Stanford University. The material in this paper was presented in part at the 1982 IEEE International Symposium on Information Theory in Les Arcs, France, June 1982. J. Foster was with Bell Laboratories, Holmdel, NJ, and the Information Systems Laboratory, Stanford University, Stanford, CA. He is now with the Tuskegee Institute, Tuskegee, AL 36088, USA. R. M. Gray is with the Information Systems Laboratory, Stanford University, Stanford, CA 94305, USA. M.O. Dunham is with BBN Laboratories, Inc., Cambridge, MA, 02238, USA.
DUNHAM
and Slepian also considered several properties of the example of a sliding-block or sliding-window code for a Gauss-Markov source. Unlike such general finite-state machines, finite-state vector quantizers assume an encoder that uses the minimum distortion rule to select a codeword from a state code book, that is, it operates as a quantizer, but at each time the quantizer may be different. Thus, finite-state vector quantizers are a special case of feedback quantizers, as introduced by Kieffer [3], as a model for predictive quantization schemes. We introduce a class of finite-state machines for data compression called finite-state vector quantizers or FSVQs. These codes are not as general as the basic models of [l] and [2], but they have an intuitive structure and their similarity to ordinary VQ’s motivates a design algorithm that is the principle contribution of this work. In the language of [2], our finite-state machine codes are of the “tracking finite-state system” variety, this is, the new encoder state is a function of the previous encoder state and the selected channel codeword rather than a more general function of the previous state and the current input vector. This permits the decoder to track the encoder state if it knows the initial condition (and the channel is noiseless). General FSVQ’s have two principal differences from ordinary memoryless VQ’s. First, the code book may vary with time as a function of the encoder state. This resembles a universal source code or a finite-choice adaptive code, but here no side information is transmitted to specify the code book selected; such information must be inferred from the received sequence of channel symbols and an initial state. Second, the minimum distortion selection rule is not necessarily optimum for a given decoder since a low distortion codeword may lead to a bad state and hence to poor long-term behavior. Thus the encoder mapping is not necessarily optimal for the decoder in any long-run sense. It is, however, a reasonable heuristic. In addition, the problem of selecting a bad state by this encoder can be minimized by good design. No new Shannon theory is required by FSVQ’s. Since FSVQ’s contain ordinary block source codes (memoryless VQ’s) as a special case, the usual positive source coding theorem relative to a distortion measure applies. Since they are special cases of finite-state machines, the negative coding theorems of [l] and [2] apply. In addition, since asymptotically mean stationary (ams) sources driving finite-state machines yield ams input/output processes
001%9448/85/0500-0348$01.00
01985 IEEE
FOSTER
et d. : FINITE-STATE
VECTOR
QUANTIZATION
[4]-[7], the converse for ams sources [7] applies. Regardless, the optimal performance that is theoretically achievable is given by the Shannon rate-distortion function of the source and distortion measure, given an ams ergodic source and a sufficiently well-behaved distortion measure. In contrast to [l] and [2], the goal of this work is to present the development and simulation of design algorithms for the class of codes considered. A finite-state vector quantizer is not the same as the finite-state Markov models now popular for use in isolated utterance and continuous speech recognition systems. (See, e.g., [S]-[ll]). W e pause to discuss these systems since we are aware of some confusion concerning their similarities and differences. In these recognitions systems, a memoryless VQ is first used as a front end (or acoustic processor) to encode speech parameters, e.g., selected spectral coefficients or LPC parameter vectors. An algorithm such as the forward-backward algorithm is then used to fit a “hidden Markov model” of a given structure to the observed VQ outputs, that is, to construct a Markov model for the observed compressed data [8]. The estimation algorithm used to find the parameters of the Markov source that best matches the observed training data is approximately a maximum likelihood algorithm if one assumes the given model. In particular, the estimation algorithm tries to match the observed relative frequencies to those that would be produced by the Markov model. Once the model is found (in the design or training stage), it is then used in a linguistic decoder to perform approximate maximum likelihood recognition on the speech data. In the recognition problem the quantization is in the memoryless VQ; the modeling is inherently a continuous estimation problem in which one tries to find transition probabilities that best explain the VQ output observations. No compression is involved in the hidden Markov modeling; the compression is performed prior to the modeling in the front end. On the other hand, an FSVQ is a compression device and hence a possible alternative to the VQ as a front end of such a recognition system. No Markovian assumptions are made, nor are transition probabilities estimated; the machine is designed to minimize average distortion between its input and output. Waveforms are matched in the sense of minimizing distortion, not in the sense of matching relative frequencies to assumed model distributions. In short, FSVQ codewords are designed given a finite-state quantizer structure, whereas the hidden Markov model structure is designed given a sequence of VQ codewords. It should also be observed that the output of the finite-state machine is not a Markov source in general (unless the process being compressed is memoryless). There is a tempting parallel between modeling a source and building a good FSVQ for the source in the sense that both the model outputs and the possible FSVQ outputs should “look like” the typical source outputs. However, this view has not yet yielded any design methods for FSVQ, probably because the notions of “look like” and the fundamental goals are quite different. In addition, the hidden Markov techniques have so far concentrated on
349
coding LPC or spectral parameter vectors; our goal is to code waveforms. (FSVQ for LPC parameter vectors is developed in [12].) In the next section we define and discuss tracking finitestate source codes and FSVQ’s. Then two different structures are considered for FSVQ, the difference being in how the reproduction codewords relate to the states of the decoder. The decoder finite-state machine can either be a Moore machine [13] with its outputs, the reproduction codewords, associated with the states (or nodes), or it can be a Mealy machine [13] with its outputs associated with the transitions (or branches). W e shall refer to these two structures as labeled-state and labeled-transition FSVQ’s, respectively. The two structures are equivalent in the sense that Mealy and Moore finite-state machines are equivalent [13]. That is, given an FSVQ of one form, one can find an FSVQ of the other form that is equivalent in the sense that a given initial state and input sequence will yield the same output sequences, except possibly for the initial state output. Thus the two forms of FSVQ are capable of the same rate-distortion performance. The complexity and storage requirements of the two forms may be quite different in different applications, however, and one form or the other may be preferable in certain cases. In addition, iterative design algorithms such as those developed here may produce quite different final codes even if run on equivalent initial codes. In the subsequent section we introduce several design algorithms for FSVQ’s. These algorithms combine a basic VQ design algorithm (see, e.g., [14]) with various ad hoc algorithms for selecting the next-state functions or state transition functions. The basic approach was motivated by the preliminary tests of Rebolledo [15], who observed that in voice coding (LPC) VQ systems, each codeword was virtually always followed by one of a small subset of codewords, that is, all codewords were almost certainly followed by a small subcode. Thus, a lower rate should suffice for the same performance if successors to a codeword are restricted to lie in a small subcode. The problem, of course, is that in the rare instances where the source changes in an unusual way, the input vector may not be well reproduced by the available subcode. In this case there may be no transition to an appropriate state and the coder can become derailed. A useful design technique must produce codes that can find their way back to good reproductions if so derailed. The next section is devoted to simulation studies for two different sources: a Gauss-Markov source and a sampled speech source. The first is useful because it is a popular test source for data compression systems and since its rate-distortion performance bounds are known. The second is of more practical interest because of the increasing importance of digital voice systems. The FSVQ’s are compared to ordinary VQ’s on the basis of performance as measured by both signal-to-(quantization)-noise ratio (SNR) and complexity. Further comparisons with other feedback quantizers, such as Cuperman and Gersho’s vector predictive quantizer [16], [17], may be found in the general survey
350
IEEE TRANSACTIONS
paper [18]. The closing section provides some conclusions and some suggestions for future research. II.
TRACKING
FINITE-STATE
SOURCE CODES
Define a state space S as a finite collection of symbols S = {a,; i = 0, 1,2; * *, K - l} called states. Fix a dimension k, called the code block length, and let Rk denote k-dimensional Euclidean space. Each input source vector x E Rk is to be encoded into a channel symbol for communication or storage. For convenience we denote the alphabet of channel symbols as N = (0, 1,2; . ., N - l}. The rate of the coding is 1ogN bits per input vector or k-’ log N bits per source symbol (or sample), where all logarithms are to base 2. Often it is convenient to consider only integer rates in bits per vector and hence to consider N = (0, l} R, where R is the rate in bits per vector. In this case the channel symbols are simply binary vectors. A finite-state encoder is a mapping LY: Rk X S + N, that is, given an input source vector x E Rk and a state s E S, then the vector x is encoded into the channel symbol or channel codeword (Y(x, s) E N. The encoder works in conjunction with a next-statefunction, a mapping f: N X S + S. If the encoder has a current state s E S, views an input vector x E Rk, and produces a codeword u = (Y(x, s), then the next encoder state is f(u, s). The encoder and next-state function together describe a finitestate machine, which we will call the encoding machine. A finite-state decoder is a mapping /3: N X S + Rk, that is, given a channel symbol u E N and a state s, the channel symbol u is decoded into a reproduction vector f = /qu,s)
= p(a(x,s>,s).
The decoder and next-state function together form a finite-state machine that we will refer to as the decoding machine. One constraint on the finite-state machines considered here is already implicit: we assume that the encoder and decoder share the same state space. By restricting the next-state mapping to depend only on the current state and the encoder output rather than on the current state and the source vector, we are assured that the decoder will be able to track the state sequence given a common initial state, and hence we have a tracking finite-state system in the terminology of Gaarder and Slepian [2]. We shall refer to such codes as tracking finite-state source codes or, simply, tracking finite-state codes. Associated with each state s E S is a state code book C, = f /3(u, s), u E N } of possible reproduction vectors obtainable by an encoder and decoder in that state. The collection of state code books is simply the decoder outputs indexed by the pairs of states and channel symbols. We define the super code book C = U s E & as the collection of all of the reproduction codewords in all of the state code books. The maximum size of the super code book is K2R. It need not actually be. this large, however, as the state code books may overlap, that is, contain common reproduction words.
ON INFORMATION
THEORY,
VOL.
IT-31, NO. 3, MAY 1985
A finite-state code is used to encode a sequence of vectors or a random process {X,; n = 0, 1,2, . . . } as follows. Given an initial state S, = a0 E S, the channel symbol sequence {U,; n = 0, 1,2, . * - }, the state sequence {S,; n = 0,1,2, *** }, and the reproduction sequence { Xn; n = 0, 1,2, . - - } are defined recursively for n = 0, 1, . - . as
III.
FINITE-STATE
VECTOR QUANTIZERS
What makes a quantizer a special case of a finite-state machine is the use of a minimum distortion encoding rule. Hence we next introduce a distortion measure. Given a source vector x = (xi, x2,. . . , xk) and a reproduction vector P = (R1,322,..., a,), the distortion d(x, 3) between x and f is defined as the mean-squared error
d( x, 5) = (Ix - f112= ;
(xi - ii)2,
i=l
where )I . ]I denotes the Euclidean norm. As with ordinary VQ’s, other distortion measures can be considered, but we shall see that we do need more than the principal requirement of ordinary VQ, the existence of a generalized centroid with respect to the distortion measure. Given the above terminology, a finite-state vector quantizer (FSVQ) is a tracking finite-state code described as above by an initial state, say aa E S; a decoder mapping /3; a next state mapping f; and a minimum distortion encoder mapping defined in terms of /I by
a(x,s)
= rnin-‘d x,p(u,s)), ” (
(3.1)
where the inverse minimum notation means that (Y(x, s) is the index u for which the reproduction codeword p( u, s) yields the minimum possible distortion over all possible reproduction codewords in the state code book C,. Here and throughout we assume an arbitrary tie-breaking rule. Thus s)) = i$
+,P(+v),
3
d(x, 2).
As do Gaarder and Slepian [2], we shall measure the performance of an FSVQ on a given source by the limit n-1
A = lim n-l 1 Ed(Xi, &)
n-+cc
i=O
if the limit exists, where E denotes expectation. This limit will exist, for example, if the joint source-reproduction process is ams [4] and the sequence { d(Xi, &); i = O,l, 2, * * . } is uniformly integrable. If the input process is ams, then the joint input/output process using a finite-state machine code is also ams [S], [6]. If the, sample average energy of the source also converges, say to ,!?, then the SNR is defined as E/A. The ergodic theorem for ams sources [4] implies that the sample average distortion and energy will converge with probability one. If the input/output pair process of vectors is also ergodic, it will
FOSTER
f?t al. : FINITE-STATE
VECTOR
converge to the above asymptotic expectations, that is, n-1 JirmmnP1 c d(Xi, 2;) = A i=O n-l
d\rnmn-l
C ]]Xi]]2 = E;
almost everywhere.
i=O
Even if the process is not ergodic, however, the ergodic decomposition theorem for ams sources [7] implies that the sample averages will converge with probability. one to their expectation with respect to an ergodic component of the stationary mean of the ams process. The point is simply that if the source is ams, then the joint input/output process of an FSVQ driven by the source will have convergent sample average distortion and energy, and hence the expectations under the true (but possibly unknown) ergodic component can be estimated by long term sample averages. An FSVQ is optimal if A is minimized (or the SNR is maximized) over all decoder mappings and next-state mappings. As in the case of ordinary VQ design, when the underlying source probability is unknown or when it is known but the expectations are computed by Monte Carlo integration, we will attempt to design codes that minimize the long-term sample average distortion for a long training sequence of data. Also as in ordinary VQ design, if the appropriate stationarity assumptions are satisfied, then this long-term sample average should well approximate the expectation, and hence similar performance should be achieved on future data from the same source. As with ordinary VQ design (and as observed by Gaarder and Slepian [2]), there seems little hope of designing globally optimum codes. The best one can hope for, given current techniques, is the design of reasonably good codes, e.g., locally optimum codes. Although we will consider design algorithms similar to those for ordinary VQ design, unlike the VQ case the difficulty of the theory of such finite-state machines has thus far precluded any demonstration that the resulting codes are locally optimal. As a final observation, if we were to permit the encoder to search several nodes into the future and find the path minimizing the sum of the distortions encountered, then the system would become an ordinary trellis encoding system with a vector alphabet. In other words, an FSVQ can be considered to a be a trellis source encoding system with a vector alphabet, a search depth of one, and a decoder that is a finite-state machine, but not necessarily a sliding-block code structure, as usually assumed. If the encoder is allowed to look ahead and use a longer search depth or delay, then such an encoder would be nearly optimal for the decoder. In an FSVQ, however, only a search of one node ahead is permitted. IV.
351
QUANTIZATION
LABELED-STATE AND
LABELED-TRANSITION
FSVQ’s As stated in the introduction, the outputs of the encoding and decoding machines can either be associated with the transitions between states (a Mealy finite-state ma-
chine) or with the states themselves (a Moore finite-state machine). While these two structures are mathematically equivalent in the sense that one can always find a machine of one kind with essentially the same output behavior as a machine of the other kind, it is of interest to consider both types of machines since the resulting complexity may differ for a particular application. In addition, running iterative improvement algorithms on equivalent codes need not produce equivalent codes. There are also distinct properties associated with the two structures that may make a difference in some applications. For example, given a labeled-transition FSVQ, one can construct an equivalent labeled-state FSVQ by simply defining a new state space consisting of old state and channel symbol pairs. The new FSVQ possesses many more states and hence has greater transition storage requirements, but it also possesses a certain additional freedom for the designer that may more than compensate for this apparent disadvantage: by changing the next-state function, one can cause a given reproduction codeword to have as successors any N of the super codewords. On the other hand, changing the next-state function of a labeled-transition FSVQ only permits one to choose which state code book, that is, which of a fixed collection of codewords, can succeed the given codeword. To be more precise, in a labeled-transition system there are K NK ways to choose a next-state function since each of the N transitions from a given state may connect to any of the K states. Note that duplicate transitions are permitted and may be useful since they can bear different labels. The corresponding labeled-state system has KN states, but now we do not permit dupliczte transitions, as they are a waste. Thus there are
KN
possible choices of the next-state i 1 function in the eqzvalent labeled-state code. This freedom can lead to quite different code books when the initial code is altered using an iterative improvement technique. Intuitively, in both systems the state itself can be considered to correspond to a coarse approximation of the last source vector and hence a form of prediction of the next source vector to be encoded. In the labeled-state case the approximation is simply the last reproduction codeword chosen, the label of the current state or node. In the labeled-transition case, the approximation is given not by a specific codeword but by a specific code book, the state code book consisting of the labels of the branches leaving the node. Thus, the labeled-state code can use the extra states to perhaps produce a better coarse approximation of the source through the state sequence than can a labeledtransition code. These observations suggest that for a fixed rate, the labeled-state FSVQ may be capable of using its additional storage to provide better performance than can an equivalent labeled-transition FSVQ. In addition, several of the design techniques for next-state functions to be considered are most natural for the labeled-state case. Neither structure is, however, a priori superior. To describe the design algorithms in a common fashion for both kinds of machines, we introduce a common nota-
352
IEEE TRANSACTIONS
ON INFORMATION
THEORY,
VOL.
IT-X,
NO.
3, M A Y 1985
tion. A labeled-state FSVQ has one output label for each This design approach is ad hoc and no results are state in S. A labeled-transition FSVQ has one output label obtained demonstrating even local optimality. It has, howfor each transition, and we can consider the transitions to ever, several useful features. It is simple and reasonably be the members of the Cartesian product space N X S intuitive; it provides the first (to our knowledge) design consisting of channel symbol/next-state pairs. Define the methodology for general finite-state vector quantizers; and label index set I to be the space S for a labeled-state FSVQ the separation into these steps permits almost direct appliand N X S for a labeled transition FSVQ. Define the label cation of the ordinary VQ design algorithm of [14] to the code book as the super code book indexed by the label first and third steps, and a variation of the algorithm together with several heuristics to the second step. index set, that is C = {c(i); i E I }. The VQ design algorithm of [14] and principles of optiIn both the labeled-state and labeled-transition case, each transition or channel symbol/next-state pair is associ- mization theory suggest that better performance could be obtained from a more involved algorithm that followed ated with a reproduction vector in the label code book; define the label index function cp: N x S + I, which gives Step 3 by redoing the next-state function to better match the index of the label of the super codeword implied by the the improved decoder. One could then iterate, optimizing transition. For a labeled-transition, FSVQ $( U, s) = (u, s), the next-state function for the label code book and vice versa until convergence. Algorithms to accomplish this are that is, the transition index and label index are identical. For a labeled-state FSVQ, +(u, s) = f(u, s), and the label reported in [12], but the simpler single pass design algorithm index function gives the next state. With this terminology above and its performance are the focus of this work. We an FSVQ is completely specified by an initial state a,, a now develop algorithms for each design step in detail. label code book C = {c(i); i E I}, a next-state function A. Initial Code Book Design f, and a label index function +: N x S --+ I. The operation of either a labeled-state or labeled-transition FSVQ can The initial super code book design is accomplished using then be described as an ordinary VQ design algorithm. A brief summary of this procedure is presented in the Appendix for reference. Details may be found in [14], [19]-[21]. Having completed the above design, one has a complete code book of reproduction values that can be considered as initial state labels. For later use we denote this final code (4.1) book, which is the initial code book of the next step as S = {c(s); s E S }, where we now use the state names as indices to emphasize the fact that these reproduction V. FSVQ DESIGNALGORITHMS codewords are associated with the states whether the FSVQ has labeled states or labeled transitions. In this section we present design algorithms for FSVQ’s. The design is divided into three steps and specific B. State Code Books and Next-State Function Design algorithms for each step are presented. The basic steps in the design of a dimension k rate R bits per vector (or Three basic techniques are introduced to find state code r = R/k bits per symbol) FSVQ having K states and books and next-state functions for a given initial code N = 2R transitions from each state and hence N reproducbook. The first two are appropriate for the design of tion words in each state code book are the following: labeled-state FSVQ’s and have the advantages of being intuitive and simple. They focus on choosing a reasonable 1) initial code book design next-state function for the given initial code and attempt 2) state code books and next-state function design no further reproduction codeword design at this stage. The 3) iterative improvement of the label code book. third technique uses a variation of the VQ design algorithm The first step is to design an ordinary VQ having K to design a collection of state code books by conditioning codewords. This code book can be viewed as a collection of on the minimum distortion state label in the code book of tentative state labels even if the FSVQ is to have labeled the previous step. Thus, candidate state code books are transitions. The code book is simply an initial super code obtained based on the assumption that the encoder knows book if the FSVQ has labeled states. If the FSVQ is to the best word in the initial code book. A next-state funchave labeled transitions, then this initial code book can be tion is then formed by trying to approximate this omniviewed as a set of state labels from which the state code scient encoder as well as possible by an FSVQ. This is books-the transition labels-will be developed. In both accomplished in different but similar ways for the cases of cases we can say that the state represents a coarse ap- labeled states and labeled transitions. proximation to the previous input vector, and hence a form The Conditional Histogram Technique:A simple and, of prediction of the current vector. Next a family of state historically, the first technique for finding a next-state code books together with a next-state function is designed function for a labeled-state FSVQ is to keep the initial using the initial code book. Lastly, for the fixed next-state code book and to form a conditional codeword histogram function, an iterative improvement algorithm’is run on the using this super code book on the training sequence; that label code book to better match it to the connection rule. is, to determine for each state the relative frequencies of all
FOSTER
et a/. : FINITE-STATE
VECTOR
QUANTIZATION
353
of the successor states. Then for each of the N states u K good memoryless VQ’s of N words each using the having the largest relative frequencies of following state s, ordinary VQ design technique; this would produce, for set f (u, s) = u. In other words, constrain the labeled-state each s, a state code book C, that is good on the average for FSVQ to have as next states the N most likely next states all of the input vectors in the training sequence that follow when the same super code book is used as an ordinary VQ. vectors that are closer to c(s) than to any other initial state This approach was motivated by the results of Rebolledo label. This would be a good code if the encoder and [15], who found that in voice coding VQ systems code- decoder were omniscient in the sense of knowing the state words in the initial code book were almost always followed label having minimum distortion to the previous input vector. While an encoder could easily find this, the decoder by one of a very small subset of codewords. Nearest Neighbor Design: A second and even simpler cannot know it without increasing the communication rate. approach is to assume that the input vectors are highly Hence this does not immediately provive the desired FSVQ. correlated and hence the available state codewords should To approximate the omniscient code by an FSVQ, two well approximate the current reproduction code word. As approaches seem reasonable. If the FSVQ has labeled in the conditional histogram design, the initial code book is states, then the new super code book is too large, having kept and we simply determine a next-state function for KN words instead of the allowed K; in addition, we that code book. In this case, for each state s find the N already have a candidate super code book, the initial code nearest neighbor state labels in the entire super code book book S. If instead of sending the best codeword in the and then assign these values to the next-state function. omniscient state code book for the given state, the encoder This method depends only on the distortion between the sends the best possible reproduction codeword in the initial codewords, while the previous method depends only on the code book, that is, the codeword in the initial code book relative frequency of successor codewords. The resulting closest to the ideal codeword in the omniscient state code code can be viewed as a form of multidimensional delta book, then the decoder can track the encoder and should modulation since the allowed next reproductions are al- produce an output reproduction close to that of the omniways close to the current reproduction. This method is scient decoder. This can be accomplished as follows. For clearly inappropriate if the number of codewords K is each state s and for each reproduction codeword 2( U, s), much larger than the number of possible transitions N, u E N, in the omniscient state code book C,, find the state since it will likely produce codes that are unable to track u for which d( E( u, s), c(u)) is minimized. Set f (u, s) = u. rapid variations, a condition analogous to slope overload in Doing this for each of the N values of u will provide the delta modulators. The method is considered, however, be- required next-state function. Since a possible transition cause it is simple and useful for comparison. would be wasted if fewer than N connections were made, if O m n iscientDesign: The third technique is the most com- the minimum distortion selection yields a redundant conplicated one considered but has the potential to make nection (one already selected), then the connection yielding better use of the memory of the resulting FSVQ. The name the next smallest distortion is selected instead (and so on if chosen reflects the approach of initially designing the state necessary). In this way N distinct connections are made. code books under the assumption that the encoder and Doing this for all s completes the design of an initial decoder are omniscient in the sense of knowing a sequence labeled-state FSVQ. Note that the state code books for the of idealized states. The omniscient encoder is then ap- omniscient code are discarded; they are simply used to proximated by a more constrained FSVQ that can be obtain a next-state function for the original code book. In the case of a labeled-transition FSVQ, the omniscient tracked by a nonomniscient decoder. If the approximation is reasonably good, then the code books designed assuming state code books provide a set of transition labels, but now the extra knowledge should also be fairly good. Labeled- we need a next-state rule. The transition label just selected state and labeled-transition designs are identical except for by an encoder provides an estimate of the state label of the the final step, but in both cases this step is different in a next state to be entered, but there may be no state with crucial way that yields different interpretations of the exactly that label. Hence, analogous to the above case, we do the best we can: pick as the next state the one with the operation of the two codes. Consider the state s to be associated with the state label label closest to the reproduction label just chosen from the c(s) E S. W e wish to design, for each such state, a state state code book. This implies a next-stage function as code book and a next-state selection rule. If the states were follows. For each state s and each u E N, find the state u not constrained to be selected according to a specific for which d(Z(u, s), c(u)) is minimized, then set f( u, s) = next-state function, but instead could be chosen in a mem- u. Doing this for each u and for each state s will yield the oryless fashion, then a reasonable heuristic would be to required next-state function and hence, in combination select the state using a minimum distortion rule with the with the state code books, the design of the labeled-transistate label VQ S. This selection would then partition the tion FSVQ. Observe that in contrast to the labeled-state training sequence according to the K different possible design, here the initial code book is discarded and the state state labels, thus producing K separate training subse- code books are retained for the final code. Both the omniscient techniques assume that the distorquences with the property that the vectors in a given subsequence all followed vectors in the original sequence tion measure is such that one can measure the distortion that were clustered about the initial reproduction words. between reproduction codewords as well as between an These new training sequences could then be used to design input vector and a reproduction codeword. This is clearly
354
IEEE TRANSAcTIONS
the case for symmetric distortion measures such as the squared error distortion considered here. The technique must be modified, however, for nonsymmetric distortion measures such as the Itakura-Saito distortion. (See [12] for details.) The omniscient design algorithm is summarized in the Appendix. The final labeled-state FSVQ is described by the super code book or label code book S and the next-state function f. The final labeled-transition FSVQ is described by the state code books {C,; s E S} and the next-state function f. Alternatively, the FSVQ of either kind is specified as in (4.1) by label code book C = {c(i); i E I }, where I is the label index set, a label index function 9: N X S -+ I giving the index of the label produced by each transition or channel symbol-state pair, and the next-state function f: N x S + S. This completes the design of the initial FSVQ.
ON INFORMATION
THEORY,
VOL.
IT-~,
NO.
3,
MAY
1985
versus labeled-transition structures. We also compared the performance of these codes with the rate-distortion function. The remaining experiments focused on comparison of FSVQ to VQ for various vector lengths.
A. Gauss-Markov Sources A Gauss-Markov source or first order Gauss autoregressive source { X, } is defined by the difference equation X n+l
=aX,+
W,,
where {W,} is a zero mean unit variance independent, identically distributed Gaussian time series. This source is of interest since it is a common model of real data, since it is a common guinea pig for comparing data compression systems, and since its optimal performance-the rate-distortion function-can be expressed parametrically and either evaluated, approximated, or bounded. (See, e.g., [22]-[24] and [19]). We here consider only the case a = 0.9 C. Iterative Improvement of State Code Books of a highly correlated source, the correlation coefficient In the final stage of FSVQ design, the initial FSVQ being typical of speech and scanned image data. To begin with, we present experimental results for the produced by the previous step is “fine tuned” to the Gauss-Markov source based on a training sequence of next-state function by replacing state or transition labels 128,000 samples and a separate test sequence of the same by centroids. The algorithm is a variation on the ordinary length. The first experiment considered the various nextVQ algorithm with the only difference being that the state function design methods. The conditional histogram, centroids are conditioned on states in the labeled-state case nearest-neighbor, and omniscient designs for labeled-state or channel symbol-state pairs (or transitions) in the FSVQ’ s were all made for a dimension k = 4 with K = 128 labeled-transition case rather than on channel symbols as states. FSVQ’ s with rates of R = 0, 1, . * . ,4 bits per vector in the usual case. These two cases are considered together or R/k = 0,1/4;-,1 bits per sample (bps) were designed using the label code book and label index function as in using each method. Labeled-transition FSVQ’s of the same (4.1). The algorithm is summarized in the Appendix. rates, dimension, and super code-book size were designed Unlike ordinary VQ and trellis encoder cases, it is possiusing the omniscient next-state function design algorithm. ble for a code book to be made worse by the centroid Thus the labeled-state FSVQ in each case had the same replacement. To guarantee a final code book at least as number of states as would the labeled-state FSVQ equivgood as the initial code book, we can halt the iterative alent to the labeled-transition FSVQ. The resulting average process if an increase in distortion occurs. Alternatively, we distortions for encoding the design and test sequences are can allow increases in distortion and continue the process depicted in Table I, together with the Shannon lower in hopes of improvement in a later iteration. Experimentally, we find that allowing increases in distortion almost bound to the rate-distortion function (which actually yields function for the R = 1 bit per vector always results in improved code books. Therefore, we halt the rate-distortion case). the improvement algorithm only when the relative decrease in distortion is less than 0.005. TABLE1 We conclude this section with a note on the choice of the COMPARISON OF TRANSITION PICKING TECHNIQUES FOR initial state. In most of the experiments presented, the GAUSS-MARKOV SOURCE, VECTOR LENGTH 4 AND 128 initial state is determined by a full search encoding. This CODEWORDS technique adds (negligbly) to the rate but is used to avoid initial derailing effects in the test sequence encoding. It is likely that the appropriate fixed state, such as a silence frame, will provide equivalent performance to full search encoding in some applications. However, this may not be true in general and we leave the issue to specific applications. VI.
SIMULATIONS
Codes were designed and tested for two sources, a Gauss-Markov source and a sampled speech sequence. The first experiments focused on a comparison of the next-state function design techniques and the labeled-state
=N = 2R = number of transitions. NN: nearest-neighbor. CH: conditional histogram. OLS: omniscient labeled-state. OLT: omniscient labeled transition. VQ: full search VQ with N codewords. D(r): rate-distortion function (r = R/k = log, N/4).
FOSTER
et &.:
FINITE-STATE
355
VECrORQUANTlZATION
In these experiments, the labeled-transition FSVQ performs consistently better than any of the labeled-state FSVQ’s. This is probably because the labeled-transition FSVQ’s had fewer empty states than the labeled-state FSVQ’s. For the labeled-state case alone, no one algorithm was consistently superior to the others. In other experiments [25] where an attempt was made to relabel empty states, the omniscient labeled-state FSVQ proved to be superior. Although these results are quite optimistic because the vector lengths and code book sizes were large relative to the training sequence length, they demonstrate the potential for better performance than is here reported. For comparison, Amstein’s optimal one-bps predictive quantizer [22] and an ordinary VQ of dimension 4 and 1 bps achieve an SNR of about 10 dB for this source. The dimension-4 labeled-transition FSVQ achieves this at 3/4 bps, and with ad hoc empty state relabeling algorithms such as the techniques described in [25], we expect that an FSVQ would achieve this performance at lower rates. The next experiment compared the SNR obtainable using labeled-transition FSVQ’s and ordinary VQ’s for a fixed rate and varying blocklength of from one to five samples. The number of states in each case was fixed at eight so as not to exhaust the training sequence. Greater improvement in the FSVQ results could certainly be obtained by increasing the number of states. The results are depicted in Table
For reference, the best rate one trellis encoder for this source with constraint length 5 and 32 codewords yields a performance of 11.43 dB [24,26]. Memory requirements of the VQ, labeled-transition FSVQ and labeled-state FSVQ are determined by the vector dimension k, the number of bytes needed to represent an element of the vector b, the number of transitions N (N = 2k for a rate of 1 bps), and the number of states K for the labeled-transition FSVQ or KN for the equivalent labeled-state FSVQ. Assuming that b, bytes are used to represent a transition, the memory requirements are respectively kNb, kbKN + KNbt, and kbKN + KN2b,. Memory requirements for VQ, eight-state labeled-transition FSVQ and the equivalent labeled-state FSVQ are depicted in Table III. In this table, we use 1 byte for representing a transition when K < 2’ and 2 bytes otherwise. W e also use 32 bits to represent an element of a vector, noting that it is possible to represent the elements more efficiently. Although the labeled-state structure requires significantly more memory than the labeled-transition structure, especially for large numbers of states, both structures have memory requirements well within current hardware technology. TABLE III VQ( kNb), LT-FSVQ (kKNb + KN), LS-FSVQ (kKNb t KN2)a
STORAGEREQUIREMENTSFOR
Sloragc (by&)
k
TABLE II DISTORTIONOF
VQ
VQ
1
II
II
( I.T-l%VQ
[ IS-l?3VQ
I
I
80
96
AND~STATELABELED-TRANSITIONFSVQFOR GAUSS--MARROV%XJRCE~
aN = 2k (rate 1 bps), K = 8, b = 4: and 1 byte is used to represent each transition.
aVector length = k, rate = 1 bps. D(r) = 13.2.
At dimension 1, a rate one-bps FSVQ outperforms a scalar quantizer by almost 5 dB and comes close to the performance of Amstein’s predictive quantizer [22]. This is interesting since, like an FSVQ, a predictive quantizer is a sequential machine and uses a minimum distortion rule; but unlike an FSVQ it is not a finite-state machine. In fact, a finite-state vector quantizer can be viewed as a form of a predictive vector quantizer allowed to have only a finite number of states. Here we expect that 16 states would suffice to match the performance of the infinite-state machine. The gap between FSVQ and VQ rapidly closes to about 1 dB and then remains at that level. It is interesting, however, that the memoryless VQ requires a dimension of 3 before it can match a dimension-l FSVQ of the same rate. At the highest dimension of 5, the eight-state FSVQ yielded performance approximately 2 dB below that of the rate-distortion bound, 11.3 dB in comparison with 13.2 dB.
Next, we briefly mention the results of earlier FSVQ experiments [25], using Gauss-Markov training sequence of length 60,000 samples. In these experiments, the state size is increased until further improvement is negligible. In addition, an ad hoc algorithm for relabeling empty states was used to improve quantizer performance. This involved relabeling any empty states with the labels of popular states and repeating the transition design procedure. Table IV shows the results of a comparison between VQ and TABLE IV DISTORTIONOFVOANDLABELED-STATEFSVOFOR
k I 2 3 4 5 6 -7
Distortion -4.3 -7.9 -9.2 -10.1
(dl%) WVQ -10.0 -10.8 -11.4 -12.1
-10.5
-12.8
-11.3 -11.6
-13.1 -13.6
VQ
K 64 256 512 512 512 1024 1024
Memory VQ 8 32 96 256 640 1536 3584
by&s FSVQ 384 3k I4k 24k
42k l52k 284k
aVector length = k, rate = 1 bps. Corresponding storage requirements given K states, 4 bytes/scalar, 1 byte for transition storage when K 5 256 and otherwise 2 bytes to represent each transition. D(r) = 13.2. 1 kbyte1024 bytes.
IEEE TRANSACTIONSON INFORMATIONTHEORY,VOL. IT-31, NO. 3, M A Y 1985
356
labeled-state FSVQ at a rate of 1 bps with the corresponding storage requirements. Here the performance of FSVQ is as much as 5.7 dB better than VQ for dimension 1 and levels off to 2 dB at higher dimensions. The results are rather optimistic due to the shortness of the training sequence and the large number of states used, as evidenced by the fact that they exceed the Shannon bound to the rate-distortion function. Table V depicts more realistic labeled-state FSVQ results, using a training sequence of 120,000 samples and a separate test sequence of 60,000 samples. For dimension 6, 64-state FSVQ’s were designed for rates of l-6 bits per block. An arbitrary initial state was fixed and hence the design algorithm was not permitted an initial full search. In addition, the algorithm for replacing isolated states was not used. These results seem to indicate that full search, initial state determination may not be necessary for a FSVQ for a Gauss-Markov source, as the test results are quite close to the training sequence results.
TABLE VI COMPARISONOF TRANSITIONPICKING TECHNIQUESFOR SPEECH, VECTORLENGTH 4 AND 128 CODEWORDP
N 2 4 8 i I6
aN = 2R = number of transitions. NN: nearest-neighbor. CH: conditional histogram. OLS: omniscient labeled-state. OLT: omniscient labeled transition. VQ: full search VQ with N codewords.
TABLE VII DISTORTION OF VQ AND 8 STATELABELED-TRANSITIONFSVQ FOR SPEECH~
TABLE V DISTORTION OF LABELED-STATE FSVQ FOR GAUSS-MARROV SOURCES
=Vector length = k, rate = 1 bps.
TABLE VIII aVector length = 6,64 states, r = R/k.
DISTORTION OF VQ AND LABELED-STATE FSVQ FOR SPEECH~ K
B. Sampled Speech The experiments were repeated for a training sequence of sampled speech waveforms composed of 640,000 samples from five male speakers sampled at 6.5 kHz (about 20 seconds of speech for each speaker). The resulting codes were then tested on a test sequence of 76,800 samples from a male speaker not in the training sequence. The test and training sequences were the same as those used in [21] and
t241. The comparison of next-state function designs assumed, as before, a dimension of k = 4 and a label bode book size of 128 with rates of R = 1-4 bits per vector. The results for the training sequence are shown in Table VI. As with the Gaussian source, the labeled-transition FSVQ consistently yields better codes, and no one of the three labeled-state design algorithm is clearly superior. Next labeled-transition FSVQ’s were designed for rate 1 bps and dimensions l-5. The design technique and storage requirements were as in the Gaussian case. The average distortion is depicted in Table VII for the resulting codes used both inside and out of the training sequence. The gain over ordinary VQ is more marked in this case than in the Gauss-Markov results, approximately 1.5 dB. As in the Gauss-Markov case, more positive results can be achieved by increasing the number of states further and relabeling empty states. Here the gains were as much as 6 dB. The
-2.0 -5.9 -6.4 -7.3 -8.1 -8.6 -9.3 --9.5
-2.0 -7.8 -9.0 -10.9 -12.2 -13.2 -14.0 -15.3
2 32 64 512 512 1024 2048 2048
Memory bylcs
VQ 8 32 96 256 640 1536 3584 i8192
FSVQ I2 384 1280 25k 43k l56k 582k III4k
aVector length = k, rate = 1 bps. Corresponding storage requirements given K states, 4 bytes/scalar, 1 byte for transition storage when K I 256 and 2 bytes otherwise.
results and storage requirements for these labeled-state FSVQs are given in Table VIII. Outside the training sequence the eight-state, dimension5, labeled-transition FSVQ compares favorably (8.54 dB) with the constraint length 5 trellis encoding system of [26] (7.35 dB) at a rate of 1 bps. In addition its performance is close to that of the constraint length 8 trellis encoding system (8.67 dB). Both the speech and the Gaussian experiments indicate that vector trellis encoders with a short search (one vector) can outperform scalar trellis encoders with long searches. The former have the advantage of having a more general finite-state machine as a decoder rather than the slidingblock code structure or nonlinear filter usually used by a scalar trellis encoder. This provides evidence in confirmation of Stewart’s [26] conjecture that more general finite-
FOSTER
etal:
FINITE-STATEVECTORQ~ANTIZATION
state machine decoders (or more freedom in next-state function design) should improve trellis encoding performance and that vector alphabets should also yield noticeable improvements. The coded speech was synthesized, informal listening tests were consistent with the measured distortion, and the FSVQ codes were substantially better than the ordinary VQ codes of the same dimension and rate. The storage requirements were, however, significantly greater. The differences between the design distortion and the test sequence distortion were reasonably typical of speech waveform coding designs based on the VQ design algorithm, about 1 to 3 dB. VII.
COMMENTS
351
A major problem with FSVQs is that of many source codes with feedback-Channel errors can have disastrous effects. The degree of the effect and means to combat it remain a subject for future research. Several options seem possible, however. The code can be periodically updated by a full search VQ, perhaps simultaneously with the transmission synchronization information. The distortion measure could be modified to be the conditional expected distortion over a noisy channel instead of the original distortion. This results in a joint source and channel code [28]. The design techniques would be essentially unchanged save for the computation of the distortion and the centroids. For simple channel models such as a binary symmetric channel, these computations should be reasonable. Although labeled-state FSVQ’s yield slightly inferior results to labeled-transition FSVQs in the examples considered here, it seems likely that each will have its uses. As a final note, during the revision of this paper, an FSVQ design technique quite similar to the omniscient design approach described here for labeled-transition FSVQ’s was independently developed by Haoui and Messerschmidt [29].
The design techniques presented have been shown to yield FSVQ’s that provide good performance on both Gauss-Markov and sampled speech sequences. In addition, the use of ad hoc algorithms to deal with the problem of empty states should result in even better performance than reported here. The good performance of FSVQ’s indicates that a finite-state machine operating on ACKNOWLEDGMENT vectors is able to take advantage of the memory of a process, even with the relatively small dimensions considered. Clearly, FSVQ’s should outperform ordinary VQ’s of The authors gratefully acknowledge the help of Mr. Tom the same dimension, but it is perhaps surprising that they Flynn of the Information Systems Laboratory, Stanford also outperform scalar trellis encoding systems, since trellis University, Stanford, CA, for numerous valuable comencoders use a finite-state machine decoder and are per- ments and for his help with the simulations. mitted to search into the future before deciding. W e conAPPENDIX jecture that the improvement is due to the fact that FSVQ’s DESIGNALGORITHMS operate on vectors and not scalars, that more general decoders are permitted, and that the design algorithms Initial Code Book Design Algorithm produce good codes within the allowed class. FSVQ’s offer an alternative to trellis encoding for Step I) Initialization: achieving performance near the rate-distortion bound, and Given they do not require the delays of trellis encoders, since they training sequence{Xi = xi; i = 0, 1,2,. . . , L - l}, have a search depth of only a single vector. The same dimension (block&e) = k, FSVQ design techniques could, however, be used to design rate = R bits per vector = R/k bits per sample, vector alphabet trellis encoders with even better perforconvergencethreshold c > 0, mance by simply allowing the encoder in the FSVQ to look an initial super code book q = {c(i); i = 0, 1, . . . , ahead before making its decision. K - l}; the initial code may be obtained by the VQ Unlike ordinary VQ’s, no optima&y or convergence design algorithm and the splitting techniquesof [14]. properties have yet been developed for the design Set M = 0 and d-, = co. algorithms developed here. A theory of design of similar codes for Markov sources has been proposed by Dunham Step 2) Minimum Distortion Encoding: [27], and there appear to be similarities with the approach Set m = m + 1. taken here. His approach assumes knowledge of various Encode training sequenceusing minimum distortion encoder conditional distributions, and here such distributions are for given code book and save running count centroid for estimated using the training sequence. Unlike Dunham’s each code-book index: Set n(u) = 0; u = 0,l; . ., K - 1, approach, our approach leads to specific codes for specific cen(u)=O; u=O,l;..,K-1, A=O. sources. Perhaps a combination of the two will help fill the For I = 0, 1,. . . , L - 1 theoretical gaps in this work. v, = mif d(X,,c(u)) The FSVQ’s designed here are amenable to implementation on a single very large scale integration (VLSI) chip. A = A + d(X,,c(U,)) = A + migd(X,,c) The development of such a chip, which permits user supn(U,) = n(U,) + 1 plied code books and next-state functions, is described in cen(U,) = cen(U,) +X,. ~251.
358
IEEE TRANSACTIONS
Set d,,,=$.
For u = O,l;..,
K-lsuchthat
n(u)fOset
cen( u) cen(z4) = ___ n(u) ’ the Euclidean centroid of all source vectors mapping into the codeword with index u. If n(u) = 0, the centroid can be defined in an arbitrary manner (e.g., split the most popular centroid). Step 3) Reproduction Codeword Update:
ON INFORMATION
4-I
u,= f$
-4,
Code Book / Next-State
Function
3,
MAY
1985
d(x,,c(+(u,S,))
A= A + d(q+(+@-W,)) = A+ min d(x,,c) CECS,
+wm
= n(+(V,,St))
cen(+(%S,>)
Design
+ 1
= cen(+(Q,S,>)
+ 4
S/+1 =f(v,,S,).
Set
Step I) Initialization:
d,=$
Given an initial state label code book S = {c(s); s E S}, N = ZR, a threshold e 2 0, a training’sequence L = {xi; i = 0,l; . ., L - l}.
For each s E S use the ordinary VQ algorithm to design a code book C’ = { C( u, s); u E N} for the training sequence L, =
(
xi EL:
d(x,-,,c(s))
= %d(xi-l,c(~))).
book.
The
threshold
ten(i)
ten ( i) = n(i)
’
e is either
used on
defined in an arbitrary centroid).
manner
(e.g., split the most popular
Step 3) Reproduction Codeword Update:
This can be accomplished either by partitioning the training sequence using the minimum distortion rule and separately running K VQ designs or by doing the designs in parallel; that is, each iteration first classifies an input vector to find the minimum distortion state and then does the VQ algorithm update step for the next input vector and the appropriate code
For i such that n(i) # 0 set
the Euclidean centroid of all source vectors mapping into the codeword with label index i. If n(i) = 0, the centroid can be
Step 2) State Code-Book Design:
state
NO.
Set m = m + 1. Encode training sequence using FSVQ encoder for given code book and save running count centroid for each codebook index as follows. Set n(i) = 0; ten(i) = 0; i E I, A = 0, S,, = a,. ForZuO,l;..,L‘1
If S I et then quite with final code book C,. Otherwise form a new code book C,,, = {ten(u); u = O,l;..,Kl} and go to Step 2. State
IT-31,
Step 2) F’SVQ Encoding:
4,
Omniscient Algorithm
VOL.
an initial state a, E S, a label index function +: N X S -+ I, a next-state function f: N X S -+ I, a convergence threshold e 2 0. Set m = 0, d-, = --oo.
Compute the distortion increment 6=
THEORY,
the
separate designs or on the overall parallel design and total distortion.
Compute the distortion increment 6 = 4-l If
0 I 6 I z, then
quit
- 4 dm .
with
final
label
code
book
C,.
Otherwise form a new code book Cm+l = {ten(i);
i E Z}
and go to Step 2.
Step 3) Next-State Function Design: REFERENCES
For each s E S and each u E N, set f(u,s)
= ~,‘d(P(u,s),c(u)).
Iterative Label Improvement Algorithm Step I) Initialization: Given
training sequence {Xi; i = 0,1,2;..,L - l}, dimension (blockwise) = k, rate = R bits per vector = R/k bits per sample, a label code book C = {c(i); i E Z }; where Z is the label index set Z = S for a labeled-state FSVQ and Z = N X S for a labeled-transition FSVQ,
111 .I. Ziv, “Distortion-rate
theory for individual sequences,” IEEE Trans. Inform. Theory, vol. IT-26, pp. 137-143, Mar. 1980. PI N. T. Gaarder and D. Slepiau, “On optimal finite-state digital transmission systems,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 167-186, Mar. 1982. t31 J. C. Rieffer, “Stochastic stability for feedback quautization schemes,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 248-254, Mar. 1982. mean stationary [41 R. M. Gray and J. C. Rieffer, “Asymptotically measures,” Ann. Prob., vol. 8, pp. 962-973, Oct. 1980. VI J. C. Kieffer and M. Rahe, “Markov channels are asymptotically mean stationary,” SIAM J. Math. Anal., vol. 12, pp. 293-305, 1980. I61 J. C. Rieffer and J. G. Dunham, “On a type of stochastic stability for a class of encoding schemes,” IEEE Trans. Inform. Theory, vol. IT-29, pp. 793-797, Nov. 1983.
FOSTER
et al. : FINITE-STATE
VECTOR
359
QUANTIZATION
[71 R. M. Gray and F. Saadat, “Block source coding theory for asymptotically mean stationary sources,” IEEE Trans. Inform. Theory, vol. IT-30, pp. 54-68, Jan. 1984. PI F. Jelinek, R. L. Mercer, and L. R. Bahl, “Continuous speech recognition: statistical methods,” in Handbook of Statistics, vol. 2, P. R. Krishnaiah and L. N. Kanal, Eds. New York: North Holland, 1982, pp. 549-573. [91 R. Billi, “Vector quantization and Markov source models applied to speech recognition,” in Proc. IEEE Znt. Conf. Acoustics, Speech, Signal Pro&sing, Paris, May 3-5, 1982. L. Rabiner. S. E. Levinson, and M. M. Sondhi, ‘I On the avvlication WI of vector quantization and’hidden Markov models to speaker-independent isolated word recognition,” Bell Syst. Tech. J., vol. 62, pp. 1075-1106, Apr. 1983. W I J. E. Shore, D. Burton, and J. Buck, “A generalization of isolated word recognition using vector quantization,” in Proc. IEEE Znt. Conf. Acowt., Speech, Signal Proc., Boston, MA, Apr. 14-16, 1983. W I M. 0. Dunham and R. M. Gray, “An algorithm for the design of labeled-transition finite-state vector quantizers,” IEEE Trans. Commun., vol. COM-33, Jan. 1985. 1131 J. E. Hopcraft and J. D. Ullman, Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley, 1979. P41 Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol. COM-28, pp. 84-95, Jan. 1980. G. Rebolledo, “Speech and waveform coding based on vector quantizations,” Ph.D. dissertation, Inform. Syst. Lab., Stanford Univ., Stanford, CA, Aug. 1982. [I61 V. Cuperman and A. Gersho, “Adaptive differential vector coding of speech,” in Proc. ZEEE Conf. Global Communications, Miami, FL, Dec. 1982. “Vector predictive coding of speech at 16 kb/s,” IEEE Trans. u71 -,
Commun., to appear. R. M. Gray, “Vector quantization,” IEEE ASSP &lag., pp. 4-29, Apr. 1984. P91 R. M. Gray and Y. Linde, “Vector quantizers and predictive quantizers for Gauss-Markov sources,” IEEE Trans. Commun., vol. COM-30, pp. 381-389, Feb. 1982. PO1 R. M. Gray and E. Kamin, “Multiple local optima in vector quantizers,” IEEE Trans. Inform. Theory, vol. IT-28, pp. 256-261, Mar. 1982. W I H. Abut, R. M. Gray, and G. Rebolledo, “Vector quantization of speech and speech-like waveforms,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-30, pp. 423-435, June 1982. W I D. S. Amstein, “Quantization error in predictive coders,” IEEE Trans. Commun., vol. COM-23, pp. 423-429, Apr. 1975. v31 J. Y. Huang and P. M. Schultheiss, “Block quantization of correlated Gaussian random variables,” IEEE Trans. Commun. Syst., vol. CS-11, pp. 289-296, Sept. 1963. 1241 L. C. Stewart, R. M. Gray, and Y. Linde, “The design of trellis waverform coders,” IEEE Trans. Commun., vol. COM-30, pp. 702-710, Apr. 1982. ~251 J. Foster, “Finite-state vector quantization for waveform coding,” Ph.D. dissertation, Inform. Syst. Lab., Stanford Univ., Stanford, CA, Nov. 1982. W I L. C. Stewart, “Trellis data compression,” Inform. Syst. Lab, Stanford Univ., Stanford, CA, Tech. rep. L905-1, July 1981. ~271 J. G. Dunham, “An iterative theory of code design,” IEEE Trans. Znform. Theory, submitted for publication. W I J. G. Dunham and R. M. Gray, “Joint source and noisy charmel trellis encoding,” IEEE Trans. Znform. Theory, vol. IT-27, pp. 516-519, July 1981. v91 A. Haoui and D. G. Messerschmitt, “Predictive vector quantization,” in Proc. IEEE Znt. Conf. Acoustics, Speech, Signal Processing, San Diego, CA, Mar. 19-21, 1984.
WI