algorithm recursively splits the (terminal) node with the largest distortion contribution and pairwise designs its children nodes. In 3, 5] improvements over thisĀ ...
Variable Length Tree-Structured Subvector Quantization
Ulug Bayazit and William A. Pearlman
Electrical, Computer and Systems Engineering Department Rensselaer Polytechnic Institute, Troy, New York ABSTRACT It is demonstrated in this paper that the encoding complexity advantage of a variable-length tree-structured vector quantizer (VLTSVQ) can be enhanced by encoding low dimensional subvectors of a source vector instead of the source vector itself at the nodes of the tree structure without signi cantly sacri cing coding performance. The greedy tree growing algorithm for the design of such a vector quantizer codebook is outlined. Dierent ways of partitioning the source vector into its subvectors and several criteria of interest for selecting the appropriate subvector for making the encoding decision at each node are discussed. Techniques of tree pruning and resolution reduction are applied to obtain improved coding performance at the same low encoding complexity. Application of an orthonormal transformation such as KLT or subband tranformation to the source and the implication of de ning the subvectors from orthogonal subspaces are also discussed. Finally simulation results on still images and AR(1) source are presented to con rm our propositions.
Keywords : Subvector, coding, vector quantization, tree structure, nonlinear interpolation 1 Introduction Vector quantization is a popular block coding method which can exploit the statistical dependencies between neighboring source samples. One of the major drawbacks of vector quantization is its complexity often expressed in terms of encoding search time and memory space required for storing encoder and decoder codebooks. Encoding search complexity increases as O(2kr ) for full-search vector quantization where k is the vector dimension and r is the bit rate per source sample. Structurally constrained vector quantizers have been investigated by many researchers to reduce the encoding search complexity. Among them Tree-Structured Vector Quantizer (TSVQ), [1, 2, 3, 4, 5] has gained popularity due to its low encoding complexity. At the expense of some coding loss TSVQ encoding complexity varies linearly with rate. Unbalanced tree-structured vector quantizers with variable-length codewords have been found to possess a coding advantage over the balanced TSVQ's with xed-length codewords. For applications where buer over ow and storage complexity constraints are not critical, their use is more advantageous. In the rst variable-length TSVQ (VLTSVQ) codebook design method introduced in [2] the greedy tree growing algorithm recursively splits the (terminal) node with the largest distortion contribution and pairwise designs its children nodes. In [3, 5] improvements over this algorithm were made by integrating rate as well as distortion constraints into the tree-structured design process. A method of pruning proposed in [4] was also applied to balanced or unbalanced trees in [3, 5] to generate VLTSVQ's. In general pruned VLTSVQ's generally have superior ratedistortion performance to greedy growth VLTSVQ's. Encoding complexity reduction in tree-structured vector quantizers has been considered in [7] and [8]. In [7] a binary tree-structured color palette is designed with the objective of minimizing mean squared distortion of the reconstructed color image. This algorithm splits a cluster of source vectors assigned to a tree node along the principal eigenvector of the cluster covariance at the point of projection of the cluster centroid onto the principal eigenvector. The algorithm in [8] also splits along the principal eigenvector, but partitioning of the cluster associated with the
parent node into the clusters associated with the children nodes is done by the LBG algorithm. Although in [7] node splitting order is determined with the objective of obtaining the greatest distortion reduction at each step, the consequences of a split on the rate of the tree are not taken into account. Moreover a split along the principal eigenvector does not even ensure the largest decrease in distortion. In [8] balanced tree structures are considered and grown without any rate-constraints. In this paper we shall grow a greedy variable-length tree-structured vector quantizer as in [5] and determine the splitting order by taking into account the eects on both the overall distortion and overall rate of the tree. The encoding decisions will be based on only one of the subvectors of a source vector at each node. Let us de ne a subvector as a subset of the source samples which constitute a source vector. To distinguish the design algorithm from VLTSVQ we shall call it Variable-Length Tree-Structured SubVector Quantization (VLTSSVQ). VLTSSVQ partitions the input space as shown in Figure 1 for two one-dimensional subvectors (U1; U2) of a two-dimensional source vector X. The reconstruction at each cell is given by the centroid of the source vectors getting mapped to that cell. A brief review of the VLTSVQ algorithm will be presented in the next section. Section III will outline the VLTSSVQ design algorithm and discuss dierent ways of grouping the source samples of a source vector into subvectors. We shall also discuss here two criteria for selecting the best subvector to be used at each node for encoding. Section IV discusses two methods of generating improved lower resolution codebooks from an initial high resolution VLTSSVQ codebook. We address the issue of implementation of VLTSSVQ on orthogonal subspaces in Section V. Simulation results, primarily comparing VLTSSVQ with VLTSVQ, and concluding remarks are presented in Sections VI and VII respectively.
2 Variable Length Tree-Structured Vector Quantization The greedy VLTSVQ tree growing algorithm of [3, 5] designs a tree structured codebook by splitting one node at a time. Let St be the set of (training) source vectors mapped to node t of the tree structure. Let p(t) denote the probability that a source vector is mapped to node t. Let d(t) and r(t) denote the contribution to the overall distortion and rate of the same node. Reproduction vector X^ t is associated with node t and p(t) X^ t d(t) r(t)
= = = =
PrfX 2 St g E[X j X 2 St ] p(t)E[kX ? X^ tk2 j X 2 St ] p(t) log(p(t)?1 )
(1) (2) (3) (4)
For a binary tree the two children of node t are denoted by tleft ; tright . The set of terminal nodes of the tree at a generic step of the design algorithm is denoted by T. The root node reproduction vector X^ 0 is found by computing the centroid of the entire set of (training) source vectors. At a generic step of the algorithm the best terminal node tmax is split to yield two new (terminal) nodes where tmax = arg tmax t (5) :t2T and
t = D(t) R(t)
(6)
D(t) = d(t) ? d(tleft ) ? d(tright ) R(t) = r(tleft ) + r(tright ) ? r(t)
(7)
is the marginal return of the split. D(t) and R(t) here are the decrease in overall distortion D and increase in overall rate R as a result of the split of t. They are de ned as
The children node reproduction vectors X^ tleft ; X^ tright are designed by running LBG algorithm on St . After each generic step T and R are updated as T R
T [ tmaxleft [ tmaxright n tmax R + R(tmax )
(8)
and the design algorithm terminates when R exceeds the desired design rate. Since the above algorithm requires the precomputation of the marginal returns for each terminal node (and therefore the design of children nodes of all terminal nodes) it is a one-step lookahead design algorithm.
2.1 Tree Pruning In cases where the marginal returns are not diminishing as a tree path is traversed from the root node to any one of the terminal nodes smaller sized trees with improved coding performance can be obtained by pruning an initial large size balanced or unbalanced tree. The marginal return of pruning the tree by removing a subtree rooted at node t is de ned slightly dierently. P d(t) ? u2B~ t d(u) (9) ~t = P u2B~ t r(u) ? r(t) where B~t is the set of terminal nodes of the subtree Bt rooted at node t. The pruning algorithm considers all interior nodes of the tree and removes the subtree Btmin rooted at tmin with ~tmin = arg min ~t (10) t:t62T
Pruning is applied repeatedly until the overall rate of the pruned tree drops below a certain desired rate where the rate of the tree is updated as X r(u) + r(tmin) (11) R R? ~ u2B tmin
3 Encoding subvectors on a tree structure (VLTSSVQ) Encoding complexity with VLTSVQ codebooks varies roughly linearly with rate. For applications where further encoding complexity reduction with very little sacri ce of coding performance is desired a simple modi cation to the VLTSVQ algorithm yields the VLTSSVQ greedy tree growing algorithm. As in the previous section probability of occurence, contributions to overall distortion and rate of node t of the tree are given by Eqn. 1, 3, 4. From here until Section V let us refer to subsets of source samples which partition a source vector as subvectors. Without loss of generality, for a stationary source we let the subvectors have equal dimensions. Let Uj , j = 0; : : :; ? 1 denote subvectors with dimension k of source vector X. Extending the notation of the previous section let the set of j'th subvector components of the source vectors assigned to node t of the tree be denoted by Stj .
3.1 Encoding and Decoding For VLTSSVQ a source vector is assigned to a terminal node as a result of a series of binary encoding decisions as in VLTSVQ. The encoder codebook consists of subcodebooks of decision subvectors. The decision subvectors of the subcodebook rooted at node t are U^ jmaxt;tleft and U^ jmaxt;tright . jmaxt is the predetermined index of the best subvector for the subcodebook rooted at node t. Its determination is explained in the next subsection. Along the
path from the root node to the terminal node the encoding decision for X at node t is performed on Ujmaxt by mapping X to the child node ti which minimizes d(Ujmaxt ; U^ jmaxt;ti ) for i 2 fleft; rightg. Since each subvector is of lower dimension the encoding complexity for each binary decision is reduced by a factor of kk . The reconstruction X^ t at node t is given by Eqn. 2 and is nonlinear interpolation ([6]).
3.2 Codebook Design The decision subvectors for the subcodebook rooted at node t, U^ jmaxt;tleft and U^ jmaxt ;tright can be designed by running the LBG algorithm on Stjmaxt . The index of the best subvector jmaxt for the subcodebook rooted at node t maximizes the marginal return of the split jmaxt = arg j :j 2fmax t;j 0;:::; ?1g j) t;j = D(t; R(t; j)
(12) (13)
where D(t; j) = d(t) ? d(tleft ; j) ? d(tright ; j) R(t; j) = r(tleft ; j) + r(tright ; j) ? r(t) (14) are the decrease in overall distortion and increase in overall rate if the j'th subvector is used to design the encoder at node t. d(ti ; j) and r(ti ; j) carry similar meanings. One can make the correspondence with Eqn. 7 by letting d(ti ; jmaxt) d(ti), r(ti ; jmaxt) r(ti ) for i 2 fleft; rightg and t;jmaxt t . Once the best subvector is determined for for a subcodebook rooted at node t and t is found, the splitting order for the terminal nodes is determined as in VLTSVQ according to Eqn. 5. In the above form of VLTSSVQ codebook design there are two nested lookahead steps instead of one as in VLTSVQ. The lookahead for the best subvector at each node is done prior to the lookahead for best node to split.
3.3 Partitioning the source vector into subvectors Several possible con gurations of decomposing the source vector into its subvectors are shown in Figure 2 for a 1-D source block of dimension 4 and for a 2-D source block of dimension 4x4. Note that in both cases the second con guration has more mutual information between each one of its subvectors and the source vector than the rst one if the source has memory. On the other hand, had no nonlinear interpolation been used for the reproduction at each node, the rst con guration would have been favored over the second one since the mutual information between subvector samples is more in this case.
3.4 A Simpler Subvector Selection Criterion In Eqn. 12 the determination of the best subvector requires the design of binary encoder codebooks and the computation of t;j in advance. A simpler criterion of interest which does not require any lookahead is to make the search over the distortion contributions of each subvector at node t jmaxt = arg j :j 2fmax dj (t) 0;:::; ?1g
(15)
with dj (t) = p(t)E[kUj ? X^ j;tk2 j X 2 St ]. X^ j;t here is the j'th subvector component of the reproduction vector X^ t at node t.
4 Pruning and Resolution Reduction of VLTSSVQ codebooks Improved VLTSSVQ codebooks can be obtained by the straightforward application of the tree pruning method of Section 2.1 after the greedy VLTSSVQ growth. Since pruning is achieved by the removal of subtrees the pruned trees have less encoding complexity than the initial large size tree. A recent method of resolution reduction developed in [9] can also be applied to VLTSSVQ codebooks to obtain improved coding performance. The idea of codebook resolution reduction is to provide a mapping U from the terminal node t of the initial high resolution VLTSSVQ codebook to the codevector with index u of a lower resolution codebook. The lower resolution codebook cells partition the input space in a better way than the suboptimal polytopal cells of the VLTSSVQ codebook. To obtain the lower resolution codebook, VLTSSVQ terminal node reproduction vectors are treated as training vectors with possibly dierent occurence probabilities and clustered by a weighted ECVQ algorithm [9]. The desired mapping U satis es (16) U(t) = u () u = arg min[d(X^ j ; X^ t) ? log pU (j)] j
and partitions the input space into M cells Ru = fX : U(T(X)) = ug (17) for u = 1; 2; : : :; M. T is the overall encoder mapping of the VLTSSVQ codebook which maps X to one terminal node. The reproduction vector of low resolution codebook cell with index u is given as X^ u = E[X j X 2 Ru] and occurs with probability X p(t) (18) pU (u) = t;U (t)=u
A desired coding rate can be achieved by using a suciently large Lagrange multiplier . Encoding takes place by mapping source vector X to terminal node t of the tree-structured codebook and further mapping t to u via U. Note that resolution reduction does not result in reduced encoding complexity as coding rate is decreased. Nevertheless, since VLTSSVQ encoding is of suciently low complexity and mapping U can be achieved by a fast table-lookup operation, the overall encoding operation is ecient.
5 Subvectors de ned on orthogonal subspaces A possible direction to take might be to apply a xed orthonormal transformation such as subband or DCT or a signal dependent transformation such as KLT to the waveform prior to coding. In general since the signal energy is concentrated in one or more of its orthogonal transform coecients coding the transform coecients by optimal rate allocation yields high coding performance. VLTSSVQ allows rate allocation to transform coecients if the subvectors are de ned as subsets of transform coecients and VLTSSVQ is used to encode these subvectors. With a large number of transform coecients the subvectors Uj will be highly uncorrelated and E[Uj j Ik (Uk ); Ij (Uj )] ' E[Uj j Ij (Uj )]; k 6= j (19) where Ij is an enumerative mapping accounting for the encoding decisions encountered for the j'th subvector along the path from the root node of the tree to a terminal node. This suggests that the nonlinear interpolative reconstruction in Eqn. 2 is redundant. For the coding of test sources this type of reconstruction might even degrade performance. Decoder codebook storage requirements can be substantially reduced by storing only the encoder codebook decision subvectors for reconstruction at node t. The reconstruction for any subvector other than Ujmaxt at nodes tleft , tright may be determined by traversing the tree in the direction from the leaves to root node and using the rst reconstruction (decision subvector) encountered for that subvector along this path. If no such reconstruction is found the reconstruction vector for that subvector is the mean of the subvector (which can be taken to be the zero vector for all practical purposes).
The subspace selection criterion does not change as a result of this modi cation. However its computation is simpli ed since the decrease in distortion as a result of growing the tree from a terminal node is given by the decrease in distortion contribution of one subspace.
6 Simulation Results In the rst part of the simulations the synthetic AR(1) source with the correlation coecient = 0:9 has been coded. The source vector dimension for all simulations was k = 8. The training and test sources each consisted of 100000 samples. In Figure 3 the operational rate-distortion points for VLTSVQ is plotted against operational rate-distortion points of VLTSSVQ of dimension xk = k where denotes the number of subvectors and k the dimension of each subvector. The terms contingent and subsampled here refer to the rst and second partitioning con gurations in Figure 2. The terms maximum distortion and marginal returns refer to the two subvector selection criteria. Also plotted is the distortion-rate curve for the rst order Gauss-Markov source parametrically expressed as
X8 D = 81 min(; l ) l=1 8 X 1 R = 8 max(0; 21 log( l )) l=1
(20) (21)
where l are the eigenvalues of the source covariance matrix. It is seen that the mse gap between VLTSVQ and VLTSSVQ closes with rate. This is especially true for source vector partitioning con guration achieved by subsampling Although at low rates source vector partitioning by grouping contingent samples appears to be more advantageous, at higher rates the con guration achieved by subsampling appears to close the gap more eciently. The two subvector selection criteria have performed close to each other. Unlike in determining the splitting order for tree growth, marginal returns criterion does not possess a signi cant advantage over maximum distortion contribution criterion in determining the best subvector to be used for encoding except at very low rates. This however is not so for coding simulations inside the training set. The encoding complexity advantage of VLTSSVQ is shown in Figure 4. The ratio of the actual number of multiplications required in VLTSVQ encoding to the actual number of multiplications required in VLTSSVQ encoding of a source vector turns out to be approximately the ratio of the source vector to subvector dimension. The results obtained with the two methods outlined above for generating improved low resolution codebooks from initial high resolution tree structured codebooks are summarized in Figure 5. A comparison with Figure 3 indicates that while the method of tree pruning may not yield much performance improvement for an AR(1) source, entropyconstrained resolution-reduction certainly does better at low rates. At high rates the weighted ECVQ algorithm results in clusters with too few members (terminal nodes). Results for the coding of Lenna512 image outside a training set of 10 images (180640 training vectors) are shown in Figures 6 and 7. Note that as with AR(1) source the performance of VLTSSVQ departs from the performance of VLTSVQ slightly at high rates. Since the number of training vector assignments to the terminal nodes of the tree structure is small the performance of nonlinear interpolative reconstruction at these nodes is low. Partitioning the source vector into subvectors by subsampling again proves to be more advantageous. Comparison of the two subvector selection criteria from Figure 8 for several stillimages shows that the marginal returns criterion has an insigni cant advantage over the maximum distortion contribution criterion. Finally we report some coding results obtained with the application of VLTSSVQ in subband domain. In this case the image was uniformly decomposed into 16 subbands and one sample from each subband constituted one subvector of dimension 1x1. Results are summarized in Table 1 and compared with spatial domain VLTSVQ (4x4 dim. source vectors) and spatial domain VLTSSVQ (2x2 dim. subvectors, 4x4 dim. source vectors). Corresponding reconstructed images are shown in Figure 9. Note that in subband domain, VLTSSVQ with nonlinear interpolative (NLI) reconstruction at the terminal nodes overconstrains the quantizer and leads to performance loss for coding
Coding method
VLTSVQ (spatial) Mse 57.36 Rate(bpp) 0.4426 Comp.(no. mult) 133.02
VLTSSVQ (spatial) 62.24 0.4448 33.335
VLTSSVQ (subband, NLI) 44.68 0.4365 12.206
VLTSSVQ (subband, no NLI) 37.33 0.4353 17.936
Table 1: Subband Domain Coding Results - Lenna512 U2 U1
Figure 1: Example of partitioning of 2-D input space by VLTSSVQ outside the training set. Although subband domain coding results are superior to spatial domain results we advocate the use of VLTSSVQ in spatial domain since we have determined that VLTSSVQ does not exploit the memory between the subbands (or orthogonal subspaces). For instance the performance of independent coding of each subband with VLTSVQ and rate allocation with BFOS algorithm turns out to be very close to that of subband domain VLTSSVQ, with much lower codebook complexity than subband domain VLTSSVQ.
7 Conclusion Encoding complexity of a tree-structured vector quantizer can further be reduced with very little performance loss. VLTSSVQ algorithm presented in this paper yields a performance very close to that of VLTSVQ especially at high rates. We note here that the performance of VLTSSVQ is more sensitive to the size of the training set since reconstruction is achieved via nonlinear interpolation and high rates imply sparsely populated terminal nodes. Unlike [8] we also suggest that VLTSSVQ be used in the spatial domain since we have not been able to demonstrate its use in exploiting the statistical dependencies between orthogonal subspaces.
Case 1 (Contingent) 1-D:
Case 2 (Subsampled)
X X O O
X O X O
X X O O
X O X O
Subvector 2 : O
X X O O
+
#
Subvector 3 : +
+
+
#
#
X O X O
Subvector 4 : #
+
+
#
#
+
Subvector 1 : X 2-D:
# #
+ +
#
Figure 2: Partitioning of source vector samples into subvectors
Performance of VLTSSVQ (AR(1)-test, rho=0.9) 0.5 0.45 0.4
-- 8x1 dim, marginal return +- 4x2 dim, contingent, marginal return *- 4x2 dim, subsampled, marginal return x- 4x2 dim, contingent, max. distortion o- VLTSVQ (1x8 dim) - R(D), 8 dim
0.35
Mse
0.3 0.25 0.2 0.15 0.1 0.05 0 0
0.2
0.4
0.6 Rate (bpp)
0.8
1
Figure 3: Results on AR(1) source
1.2
Encoding comp. of VLTSSVQ (AR(1)-test, rho=0.9) 50 45
Comp (no. mult./sourcevector)
40 35
-- 8x1 dim, marginal return +- 4x2 dim, contingent, marginal return *- 4x2 dim, subsampled, marginal return x- 4x2 dim, contingent, max. distortion o- VLTSVQ (1x8 dim)
30 25 20 15 10 5 0 0
0.2
0.4
0.6 Rate (bpp)
0.8
1
Figure 4: Results on AR(1) source
1.2
Pruning and resolution reduction (AR(1)-test, rho=0.9) 0.5 0.45 0.4 -- VLTSVQ (1x8 dim) pruning +- 4x2 dim, contingent, pruning *- 4x2 dim, contingent, res. reduction x- R(D), 8 dim
0.35
Mse
0.3 0.25 0.2 0.15 0.1 0.05 0 0
0.2
0.4
0.6 Rate (bpp)
0.8
1
Figure 5: Results on AR(1) source
1.2
Performance of VLTSSVQ (Lenna512, 10-img. trn. set) 300
250
Mse
200 + VLTSVQ (16 dim.) * 4x4 dim., contingent x 4x4 dim., subsampled
150
100
50 0
0.1
0.2
0.3 Rate (bpp)
0.4
0.5
Figure 6: Results on Lenna512
Encoding comp. of VLTSSVQ (Lenna512, 10-img. trn. set) 180 160 140
Mse
120 100
+ VLTSVQ (16 dim.) * 4x4 dim., contingent x 4x4 dim., subsampled
80 60 40 20 0
0.1
0.2
0.3 Rate (bpp)
0.4
0.5
Figure 7: Results on Lenna512
Comparison of subvector selection criteria
Mse
300 200
+ Lenna512 - marginal returns x Lenna512 - maximum distortion
100 0 0.1
0.15
0.2
0.25
0.3
0.35 0.4 Rate (bpp)
0.45
0.5
0.55
0.6
Mse
300 200
+ Tiffany - marginal returns x Tiffany - maximum distortion
100 0 0.1
0.15
0.2
0.25
0.3 Rate (bpp)
0.35
0.4
0.45
0.5
Mse
200 + Woman - marginal returns x Woman - maximum distortion
100 0 0.15
0.2
0.25
0.3
0.35 Rate (bpp)
0.4
0.45
0.5
Figure 8: Results on Stillimages
0.55
REFERENCES [1] A. Buzo and A. H. Gray and R. M. Gray and J. D. Markel, \Speech Coding Based Upon Vector Quantization", IEEE Trans. on ASSP, vol. ASSP-28, pp. 562-574, Oct.1980. [2] J. Makhoul and S. Roucos and H. Gish, \Vector Quantization in Speech Coding", Proc. of IEEE vol. 73, pp. 1551-1587, Nov. 1985. [3] E. A. Riskin and R. M. Gray, \A Greedy Tree Growing Algorithm for the Design of Variable Rate Vector Quantizers," IEEE Trans. on ASSP, vol. 73, pp. 1551-1558, Nov. 1991. [4] P. A. Chou and T. Lookabough and R. M. Gray, \Optimal Pruning with Applications to Tree-Structured Source Coding and Modelling," IEEE Trans. on Info. Theory, vol. IT-35, pp. 299-315, March 1989. [5] B. Mahesh, W. A. Pearlman, L. Lu, \Variable-Rate Tree Structured Vector Quantizers," IEEE Trans. Inform. Theory, vol. 41, pp. 917-930, July 1995. [6] A. Gersho, \Optimal nonlinear interpolative vector quantization," IEEE Trans. Comm., COM-38(9-10):12851287, 1990. [7] M. T. Orchard and C. A. Bouman, \Color Quantization of Images," IEEE Trans. Signal Proc., vol. 39, pp. 2677-2690, Dec. 1991. [8] L. Po and C. Chan, \Adaptive Dimensionality Reduction Techniques for Tree-Structured Vector Quantization," IEEE Trans. Comm., pp. 565-580, March 1993. [9] U. Bayazit and W. A. Pearlman, \Improving The Performance of Optimal Joint Decoding," Proc. ICIP 95', October 1995. [10] P. A. Chou et al., \Entropy-constrained vector quantization,"IEEE Trans. Acoust., Speech, Signal Proc., ASSP36:31-42, 1988.
Figure 9: Lenna512, Top-Left : Original, Top-Right : Fullband VLTSVQ, Bottom-Left : VLTSSVQ (spatial, 4 2x2 dim. subvectors), Bottom-Right : VLTSSVQ (subband, 16 1x1 dim. subvectors, No NLI)