A Comparison of Dierent Optimized Structures of Filter Banks for Video Coding Ilangko Balasingham, Arild Fuldseth and Tor A. Ramstad Department of Telecommunications, Norwegian University of Science and Technology (NTNU), N-7034 Trondheim, Norway E-mail:
[email protected]
ABSTRACT
In this paper, optimized perfect reconstruction octave-band two-stage and three-stage treestructured lter banks and optimized uniform four-channel and eight-channel lter banks are compared for hybrid video coding. The lter coecients are optimized at each stage for subband coding gain for nonuniform lter banks. Six types of frequency partitionings are presented to compare the ecient utilization of the spatial redundancies in the prediction error image. The performance of the proposed lter banks are compared for PSNR and visual quality. The overall performance indicates that the optimized perfect reconstruction octave-band twostage tree-structured (FB 22 22 8) lter bank is a good candidate for video coding purpose.
1. INTRODUCTION
Unitary and nonunitary lter banks [1, 2] with uniform frequency separation are widely employed in subband coding of speech and image signals. In such systems, the input signal, x(n), is split into subband signals with equal bandwidths. Nonuniform lter banks [3], if correctly optimized, can alleviate some of the typical artifacts experienced in subband coding, notably ringing when the lters' unit sample responses are long, and blocking in the case of short responses. High frequency resolution at low frequencies and lower frequency resolution at higher frequencies represent a good compromise in terms of the coders' ability to cope both with large areas of constant spectral contents, i.e., visually stable areas, and transients. On the other hand, systems constructed on these premises may sometimes lead to reduced average coding gain. One possible candidate for nonuniform lter banks is a tree-structured system. To guarantee perfect reconstruction (PR) through the analysis-synthesis system, constraints among the lter coecients have to be enforced. The system is then completely free of aliasing, amplitude and phase distortions. The lter coecients are optimized for subband coding gain at each stage. In the case of still image compression, optimized per-
fect reconstruction octave-band three-stage treestructured lter banks give rise to subjectively and objectively better image quality [3] than traditional octave-band tree-structured and uniform lter banks. In the case of video compression, we compare optimized perfect reconstruction octaveband two-stage and three-stage tree-structured lter banks when a traditional video coding scheme, shown in Figure 1, is used.
2. POLYPHASE REPRESENTATION
The invention of polyphase lter representation by Bellanger et al. [4] in multirate systems introduced an ecient method of implementing lters, especially lter banks. The perfect reconstruction property of a decimated lter bank can be guaranteed by using the polyphase matrices [2]. Assume that P(z ) and Q(z ) denote decimated analysis and synthesis polyphase matrices, respectively. Perfect reconstruction is obtained if P(z)Q(z) = cz?k I; (1) where I is an identity matrix, c is an arbitrary constant and k is an integer. Then, the reconstructed signal becomes equal to a scaled and delayed version of the input signal. By using FIR analysis lters, FIR synthesis lters are obtained by setting appropriate terms to zero in the determinant of P(z ). GLP (z ) = HHP (?z ) and GHP (z) = ?HLP (?z) where H(z) and G(z) denote analysis and synthesis lters, respectively.
3. OPTIMIZATION OF THE FILTER COEFFICIENTS BASED ON CODING GAIN
Coding gain is used as a measure of data compression [5]. Katto and Yasuda [6] have derived a compact formula to evaluate the generalized subband coding gain for nonuniform lter banks given by 2 GSBC = QM ?1 x 1=k (2) k=0 (Ak Bk k ) where Ak = hTk Rxxhk ; (3)
and
(4) Bk = 1 gkT gk : k fhk g and fgk g are the impulse responses of the analysis and the synthesis lters, respectively, fk g are the decimation/interpolation factors, M is the number of channels, x2 and Rxx are the variance and the autocorrelation matrix of the input signal, respectively. From our experimental results the underlying statistics of the prediction error image is approximately an AR(1) process where the nearest sample correlation = 0:60. Hence, the autocorrelation matrix, Rxx , has elements de ned as, Rxx(i; j ) = ji?jjx2; i; j 2 [0; Nk ? 1] (5) where denotes the correlation coecient of the input signal, and Nk is the analysis lter length. The one dimensional lter coecients are obtained by maximizing the subband coding gain given in Equation 2. The maximum one dimensional theoretical coding gain equals 1.94 dB when = 0:60. The two dimensional lter banks are constructed from one dimensional lter banks by using them on the horizontal and vertical image coordinates, respectively.
4. CODING SCHEME Input . - g- Analysis FB - Q 6
Q
Channel
-
? -1
?
Synthesis FB
- ?g Motion comp. prediction Memory - Mot.6est. Figure 1. Block diagram of motion compensated predictive coder. Figure 1 depicts a hybrid video coding scheme where the spatial redundancies in the prediction error image are removed by a still image frequency domain coder. This approach is widely used in several video coding standards, as e.g. [7]. The main dierence here is that we employ a lter bank to decorrelate the prediction error image rather than a transform such as a DCT. In frequency domain coders the number of bands to be used is a compromise between the frequency
and spatial resolutions according to the Heisenberg uncertainty relation among Fourier transform pairs. With nonvarying estimated local statistics in an area, high frequency resolution will lead to high coding gain. For images with changing estimated local statistics, the lter length should be comparable to typical object sizes for which the statistical models remain constant. Short lter lengths require a small number of channels. (For more accurate assessment of the signi cance of frequency versus spatial resolution, visual perception comes into play). Prediction error images in the hybrid video coder, which we compress using subband coding, has atter spectrum than typical still images. Due to motion compensation only relatively small \objects" plus noise like artifacts appear. This leads us to believe that lter banks with few channels and short lters should be applied. Furthermore, small quantizer block sizes are preferable under the bit allocation scheme at the cost of more side information. Our objective here is to compare dierent types of lter banks for video coding purposes. Hence, motion estimation and compensation, and quantization and coding schemes should be identical for all types of lter banks in order to do a fair comparison test. Motion block size of 16 16 and a search window of 15.5 pels with half pel accuracy are used for the block matching algorithm to nd the motion vectors. A global motion vector is assigned for each frame to extend the eective motion vectors' ranges and to code the local motion vectors eciently. Each subband block of size 4 4 subband samples is classi ed into one out of six classes according to the mean squared value. Then, the subband samples are quantized by a pyramid vector quantizer, as described in [8], with dierent number of bits for each class. The side information consists of standard deviations of the classes, bit allocation tables, and motion vectors. The codec operates as a xed rate coder, i.e., the total number of bit per pixel (bpp) is a constant for each frame. The comparisons are done with six sets of lter bank structures as depicted in Figures 2 and 3. The maximum decimation factor is four in all systems in Figure 2 while in Figure 3 it is eight. The left systems in both gures (Types 1 and 4) represent uniform parallel lter banks. 16I 60 has fourchannel sixteen-tap while 32I 60 has eight-channel thirtytwo-tap. The lter coecients are then optimized as described in [9]. Types 3 and 6 frequency partitioning are obtained by extending the ltering process in Types 2 and 5 to its adjacent high frequency subbands, respectively. Nonuniform lter banks, shown in Figure 2, are constructed by 2 sets of two-channel lter banks whereas those in Figure 3 are constructed by 3 sets of two-channel lter
Table 1. Building blocks of nonuniform lter banks. FB
N o: LP taps N o: HP taps
S-1 FB 16 16 6 6/6 FB 22 22 8 8/8 FB 25 13 3 9/3
S-2 6/6 8/8 9/3
Total no. of taps LP BP HP 16 16 6 22 22 8 25 13 3
LL
HH
LL
HH
HH
Figure 2. Left: Type 1, Middle: Type 2 and Right: Type 3. LL
LL
HH
LL
HH
HH
Figure 3. Left: Type 4, Middle: Type 5 and Right: Type 6. lter banks, the perfect reconstruction property is guaranteed by Equation 1 at each stage. The remaining degrees of freedom are used to optimize the lter coecients by maximizing for coding gain as given in Equation 2. Nonuniform lter banks' lter coecients are optimized after calculating the total unit sample responses for each channel by convolving the unit samples responses from each stage. However, for odd length lter banks such as FB 25 13 3 and FB 57 33 13 3, the lter coef cients are optimized at each stage. We assume that the input signal in each stage can be approximated as an AR(1) process where the nearest sample correlation factor, , is estimated accordingly. This has been done just because of the optimization algorithm failed to provide \good" frequency response for the lter banks.
5. RESULTS
N o: LP taps N o: HP taps
S-1 FB 36 36 16 6 6/6 FB 50 50 22 8 8/8 FB 57 33 13 3 9/3
banks as given in Table 1. The number of taps for analysis lter banks are only listed in Table 1 for FB 25 13 3 and FB 57 33 13 3. For nonuniform LL
FB
For the computer simulation, we chose input signals as Foreman (luminance, 300 frames) and Calendar (luminance, 125 frames) CIF sequences. The frame rate is at 30 Hz. Two types of assessments have been done to compare the lter banks' performance for the frequency partitionings shown in Figures 2 and 3. Ta-
S-2 6/6 8/8 9/3
S-3 6/6 8/8 9/3
Total no. of taps LP BP1 BP2 HP 36 36 16 6 50 50 22 8 57 33 13 3
bles 2 and 3 show performance as PSNR. Furthermore the visual quality was assessed by a paired comparison subjective test. The purpose of the paired comparison test is to cross check whether the picture quality measured in terms of PSNR and visual quality coincide. The signi cance of a small informal subjective test must not be overrated. Assume that the probability distribution of choosing one out of two lter banks is binomial with eight viewers. As an example, the 95 % con dence interval requires at least seven votes. However, in this experiment some candidates are favored just because they received majority of the votes. From our comparison test done only for Calendar sequence at 0.30 and 0.75 bpps, the following conclusions can be drawn. For two-stage lter banks, FB 22 22 8 seems to perform better than FB 16 16 6 where Type 3 frequency partitioning is preferred. Two-stage lter banks are better than their three-stage counterparts. However, for low bit rates, say 0.30 bpp, FB 36 36 16 6 with Type 5 partitioning is preferred. Comparing the uniform lter banks with DCT, 16I 60 becomes a clear winner. Odd length nonuniform lter banks perform poorly, and this can be attributed to the fact that analysis highpass lter of the two-channel lter bank (see Table 1) has none of its two zeros at 0. This gives a poor dc leakage suppression and the lowpass synthesis lter fails to attenuate the high frequency components. From this preliminary test we conclude that nonuniform lter banks are better than uniform lter banks where FB 22 22 8 with Type 3 frequency partitioning is a good choice for coding of video sequences. The reason may be two-fold; 1) the frequency selectivity of FB 22 22 8 is better than FB 16 16 6. 2) the spatial resolution of the high frequency subbands becomes better than uniform lter banks and is enhanced by using a block size of 4 4 under bit allocation.
6. CONCLUSION
Optimized perfect reconstruction octave-band two-stage and three-stage tree-structured lter banks and optimized uniform sixteen-tap fourchannel and thirtytwo-tap eight-channel lter banks are presented. The lter coecients of
Table 2. Simulation results for two-stage nonuniform and 4 4 uniform lter banks. Calendar Foreman FB Bpp vs. PSNR in dB Bpp vs. PSNR in dB 0.20 0.30 0.40 0.50 0.75 0.20 0.30 0.40 0.50 T 1 16-tap FB 25.802 27.307 28.554 29.637 31.986 34.919 36.460 37.675 38.689 DCT4 25.763 27.291 28.547 29.642 32.002 34.714 36.263 37.486 38.507 T 2 FB 16 16 6 25.711 27.205 28.480 29.602 32.027 34.792 36.341 37.551 38.580 FB 22 22 8 25.772 27.278 28.554 29.676 32.122 34.826 36.382 37.605 38.628 FB 25 13 3 25.430 26.950 28.224 29.364 31.809 34.461 36.036 37.267 38.323 T 3 FB 16 16 6 25.667 27.166 28.423 29.538 31.954 34.742 36.270 37.484 38.509 FB 22 22 8 25.732 27.239 28.503 29.619 32.052 34.782 36.325 37.546 38.570 FB 25 13 3 25.334 26.841 28.105 29.243 31.684 34.310 35.899 37.131 38.190 Table 3. Simulation results for three-stage nonuniform and 8 8 uniform lter banks. Calendar Foreman FB Bpp vs. PSNR in dB Bpp vs. PSNR in dB 0.20 0.30 0.40 0.50 0.75 0.20 0.30 0.40 0.50 T 4 32-tap FB 25.426 26.876 28.092 29.157 31.459 34.665 36.157 37.329 38.353 DCT8 25.451 26.906 28.127 29.195 31.504 34.585 36.104 37.287 38.319 T 5 FB 36 36 16 6 25.729 27.234 28.485 29.599 32.018 34.788 36.328 37.543 38.571 FB 50 50 22 8 25.770 27.275 28.550 29.656 32.093 34.828 36.369 37.596 38.616 FB 57 33 13 3 25.257 26.767 28.028 29.165 31.609 34.236 35.825 37.066 38.130 T 6 FB 36 36 16 6 25.729 27.234 28.485 29.599 32.018 34.794 36.330 37.547 38.573 FB 50 50 22 8 25.767 27.273 28.550 29.659 32.094 34.828 36.369 37.596 38.616 FB 57 33 13 3 25.249 26.766 28.034 29.166 31.608 34.236 35.825 37.066 38.130
the lter banks are optimized for subband coding. Six types of frequency partitionings are proposed to utilize the spatial redundancies in the prediction error image. From the simulation results in terms of PSNR and our preliminary subjective comparison test, the following conclusions can be drawn. If the uniform lter banks are used, then its maximum decimation/interpolation factor should be equal to four instead of eight as for example in 32I 60 and DCT8. Constructing odd length nonuniform lter banks using odd length two-channel uniform lter banks where the analysis highpass lters have none of their zeros at should be avoided. Two-stage and three-stage nonuniform lter banks are better than all types of uniform lter banks. The overall performance of the two-stage FB 22 22 8 with Type 3 frequency partitioning is preferred compared to all other types of lter banks. For further adaptation to the prediction error image content, it might be interesting to allow different block sizes in each subband under bit allocation.
REFERENCES
[1] T. A. Ramstad, S. O. Aase, and J. H. Husy, Sub-
band Compression of Images { Principles and Examples. North Holland: ELSEVIER Science Pub-
lishers BV, 1995.
[2] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Clis: Prentice Hall, 1993. [3] I. Balasingham and T. A. Ramstad, \Optimized perfect reconstruction tree-structured lter banks for image coding," in Proc. Int. Conf. on Image Processing, Sept. 1996. (Lausanne, Switzerland). [4] M. Bellanger, G. Bonnerot, and M. Coudreuse, \Digital ltering by polyphase network: application to sample rate alteration and lter banks," IEEE ASSP Mag., vol. 24, pp. 109{114, 1976. [5] N. S. Jayant and P. Noll, Digital Coding of Waveforms, Principles and Applications to Speech and Video. Englewood Clis, New Jersey: PrenticeHall, Inc., 1984. [6] J. Katto and Y. Yasuda, \Performance evaluation of subband coding and optimization of its lter coecients," in Proc. SPIE's Visual Communications and Image Processing, pp. 95{106, Nov. 1991. [7] ITU-T (CCITT), Video Codec for Audiovisual Services at p 64 kbit/s. Geneva, Italy, Aug. 1990. Recommendation H.261. [8] T. R. Fischer, \A pyramid vector quantizer," IEEE Trans. Inform. Theory, vol. IT-32, pp. 568{583, July 1986. [9] S. O. Aase, Image Subband Coding Artifacts: Analysis and Remedies. PhD thesis, The Norwegian Institute of Technology, Norway, Mar. 1993.