based on a spatiotemporal resolution pyramid combined with E8-lattice vector ... We rst introduce spatiotemporal pyramids and appropriate coding schemes.
SCALABLE VIDEO CODING WITH MULTISCALE MOTION COMPENSATION AND UNEQUAL ERROR PROTECTION Bernd Girod,1 Uwe Horn,1 and Ben Belzer2 1 Telecommunications Institute
University of Erlangen-Nuremberg Cauerstrasse 7/NT, D-91058 Erlangen, Germany 2 UCLA Electrical Engineering Department 405 Hilgard Avenue, Los Angeles, CA 90024-1594, USA
1 INTRODUCTION
Scalable video codecs should support variation in image resolution, partial decodability of the bit-stream and computation-limited coding and decoding. Scalability can also be utilized within wireless video systems or for digital broadcasting, where the variable bit error rate of the radio channel is a major obstacle for many conventional, non-scalable video compression schemes. In this paper, we present some of our recent results obtained for a scalable video codec based on a spatiotemporal resolution pyramid combined with -lattice vector quantization. We rst introduce spatiotemporal pyramids and appropriate coding schemes. We discuss the problem of optimum bit-allocation and multiscale motion compensation. In the second part we present simulation results concerning coding performance, software-only decoding, and digital video broadcasting. E8
2 SPATIOTEMPORAL RESOLUTION PYRAMIDS
Encoding of spatial resolution pyramids can be done by predictive coding schemes as shown in Fig. 1(a) and (b). Without loss of generality we consider initially only two layer pyramids. Fig. 1(a) shows an open-loop coder whereas Fig. 1(b) shows a closed-loop coder which results from Fig. 1(a) by including noise feedback. Common to both approaches are lters for downsampling and interpolation. The decoder for both coder structures is identical (Fig. 1(c)). Critically sampled subband coding schemes need carefully designed lters to achieve perfect reconstruction and aliasing cancellation. In pyramid decompositions, lters can be designed more freely according to aspects like subjective image quality and complexity. Another advantage is that multiscale motion compensation can be easily incorporated as described in section 2.2. Filters we us in our simulations have impulse responses [1 2 1 2] prior to downsampling and [1 4 3 4 3 4 1 4] after upsampling. For two-dimensional ltering they are applied separately in the horizontal and vertical direction. Both lters can be implemented without multiplications. Particularly, interpolation at the decoder is very inexpensive. For encoding, the input signal is rst ltered and then downsampled. The coarse resolution layer is quantized and transmitted. The open-loop coder (Fig. 1(a)) uses the coarse 1, 2, 3
=
=
=
=
=
=
Quantization
Lowpass filter and downsampling
Q
Q -
Image
+
coarse resolution
-
Upsampling and interpolation filter
+
Image
Q (a) Open-loop
Q (b) Closed-loop
+
fine resolution
(c) Decoder
Figure 1: Two dierent pyramid encoders (a), (b) with corresponding decoder (c) resolution signal prior to quantization to generate a prediction for the ne resolution layer. With noise-feedback (Fig. 1(b)) the signal after quantization is interpolated and used as prediction. In both cases the resulting prediction error is quantized and transmitted to the decoder where it it is used to reconstruct the ne resolution layer. The bitrate in each layer can be controlled independently by the corresponding quantizer. In the given example two bit-streams are transmitted. At a low bitrate only the coarse resolution layer is decoded. Decoding of the ne resolution layer needs a higher bitrate for additional transmission of the interpolation error.
2.1 Optimum Bit-Allocation
Given a certain amount of bits to be spent for one frame, how does one allocate these bits among the various layers? By using a Lagrange approach, it can be shown that the allocation is optimal if it meets the equal distortion-rate slope condition. Bit-allocation found by this approach often leads to an unacceptably low quality in the low resolution layer. Quality in coarse layers can be improved only by leaving the overall optimum. As shown by Ramchandran et al., and also veri ed by our own experiments the closed-loop approach is less sensitive to suboptimal solutions. Additional constraints like comparable picture qualities in all layers can be added without noticeable penalty. Therefore, we focus on closed-loop coders for the remaining part of the paper. 4, 5
5
2.2 Multiscale motion compensation
The predictive coders described above eciently exploit redundancy in the spatial domain. Further improvement can be gained by applying motion compensation (MC) which also exploits redundancy in the temporal domain. For partial bit-stream decodability and ecient compression at all bitrates it is essential that MC is used within all spatial resolution layers which is commonly referred to as multiscale motion compensation (MSMC). Motivated by results from a theoretical model for single layer motion compensation, we investigate two dierent block-based MSMC approaches. For the rst approach (Fig. 2) we add motion compensation loops to the spatial pyramid decomposition resulting from the closed-loop coder shown in Fig. 1(b). In pyramids with more than two layers it can be seen that, except for the lowest resolution layer, MC applies to interpolation error signals. Since those signals have bandpass character we refer to this method as bandpass compensation. The second approach can be seen in Fig. 3. In contrast to bandpass compensation, MC is applied to the original frame and all of its lowpass ltered and downsampled versions. We refer to this method as lowpass compensation. Both approaches can switch between an inter- and an intra-mode. In intra-mode where MC is not used both schemes work like the coder of Fig. 1(b). Selection of coding modes and estimation of motion vectors is explained later in this section. 6
2
Decoder
Coder s1
e1
+
Multiscale motion compensation
Q
Spatial Pyramid Synthesis
-
+
Inter
Intra
r1
MC
s0 Frame
+
MC
Motion vectors, coding mode
i0
-
Inter
Intra
+
e0
+
Q
-
+
Inter
Intra
Inter
Intra
r0
MC
+
+
MC
Motion vectors, coding mode
Figure 2: Two-layer pyramid codec with bandpass MC Coder s1
+
Decoder e1
Multiscale motion compensation
Q
Spatial Pyramid Synthesis
-
Intra
+
Inter
Intra
r1
MC
Inter
+
MC
Motion vectors, coding mode s0 Frame
+
e0
Q
-
i0 Intra
Inter
+
r0
Intra
MC
Inter
+
MC
Motion vectors, coding mode
Figure 3: Two-layer pyramid codec with lowpass MC It can be seen that the decoder of the approach shown in Fig. 3 is less complex. Interpolation from lower resolution is employed only in intra-mode whereas in Fig. 2 it is always needed, independent from the selected coding mode.
2.2.1 Comparison of lowpass and bandpass MC. To get a rst insight into the per-
formance of both MC schemes we compare the codecs corresponding to Figs. 2 and 3. We use frames generated by gradient based spatial interpolation from odd elds of an interlaced test sequence. Image sizes are 704 480 for the ne resolution layer and 352 240 for the coarse layer. For block-based motion compensation blocksizes are set to 16 16 and 8 8, respectively. Motion vectors with half-pel accuracy are computed based on a mean-squared error (MSE) criterion independently for each layer. The same set of motion vectors is used for both MC schemes. Pixels at half-pel positions are obtained by bilinear interpolation. Coding mode is selected for each block based on a minimum MSE criterion. Since for a two-layer pyramid motion compensation in the coarse layer is applied to identical signals in both approaches we can focus on the signal to be quantized and coded in the ne resolution layer. In intra-mode depends in both approaches on the signals and where furthermore depends on . Therefore contains quantization noise introduced by both quantizers. A dierent situation results from inter-mode. With lowpass MC only depends on whereas with bandpass MC it still depends on and . One would expect 7
e0
e0
i0
i0
s0
r1
e0
e0
s0
s0
3
i0
that lowpass MC is less sensitive to quantization noise in the lower resolution layer since this noise in uences only in intra-mode. e0
Tabletennis 35
34
34
33
33
Normalized prediction error variance [dB]
Normalized prediction error variance [dB]
Football 35
32 31 30 29 28 27 26 25 0
32 31 30 29 28 27 26
20 40 60 80 100 120 Quantization noise variance in coarse and fine resolution layer [MSE]
25 0
140
20 40 60 80 100 120 Quantization noise variance in coarse and fine resolution layer [MSE]
Mobile & Calendar
35
35
34
34
33
33
Normalized prediction error variance [dB]
Normalized prediction error variance [dB]
Flowergarden
32 31 30 29 28 27 26 25 0
140
32
Bandpass MC
31
Lowpass MC
30 29 28 27 26
20 40 60 80 100 120 Quantization noise variance in coarse and fine resolution layer [MSE]
140
25 0
20 40 60 80 100 120 Quantization noise variance in coarse and fine resolution layer [MSE]
140
Figure 4: Multiscale motion compensation performance of lowpass and bandpass MC Fig. 4 yields some insights in this in uence of quantization noise for a two-layer closedloop codec extended by lowpass and bandpass MC, respectively. The horizontal axis denotes the variance of added white noise to model equal quantization errors within both layers. The vertical axis denotes the prediction error variance ( ), normalized by the square of the amplitude range 0 255 expressed in dB. It can be seen that for higher noise variances lowpass MC results in a lower prediction error variance. That means in terms of bitrate that for low bitrate applications the less complex lowpass compensation scheme is appropriate whereas at higher bitrates it is advantageous to apply bandpass compensation. e0
:::
2.2.2 Motion estimation techniques. Performance of motion compensation depends
on how motion vectors for each resolution layer are estimated. Three dierent methods are considered. The rst approach estimates motion vectors only for the ne resolution layer. Motion vectors for coarser resolution layers are obtained by appropriate downscaling. The second approach estimates motion vectors for each layer independently. As a third and most natural technique for resolution pyramids hierarchical estimation can be applied. Besides quality of the motion vector eld it is important to consider how eciently motion vectors can be encoded. Simulations show that predictive coding using upscaled motion vectors from the lower resolution layer as a predictor gives better results than using the preceding motion vectors of the same resolution layer. As expected, best compensation results are achieved with independent estimation which has the highest complexity. Hierarchical estimation performs nearly as well as the latter at a much lower complexity. Furthermore, motion vectors obtained by hierarchical estimation can be encoded with the smallest amount of bits. In terms of compensation performance, motion vector encoding costs, and encoder complexity, hierarchical estimation oers the best compromise. 8, 9
4
2.3
E8
-Lattice Vector Quantization
Vector quantizers are known to have a low decoder complexity since decoding is carried out by a simple table-lookup. Discussions of VQ including codebook training by the LBG algorithm and entropy-constrained VQ can be found in various publications. However, for higher bitrates, the computational burden of an unstructured codebook search at the coder becomes prohibitive. This search can be avoided by lattice VQ where a highly structured codebook is used. With lattice VQ, the representative for a given input vector can be computed with much less eort compared to unstructured codebook search. It is reported that for image subbands entropy-coded -lattice VQ performs as well as unstructured entropy-constrained VQ even at bitrates below 0.5 bits/sample. Another interesting property of the -lattice is its correspondence to the densest sphere packing for eight dimensions. Furthermore fast quantization and decoding algorithms can be used. We improved lattice VQ performance by using a hybrid quantization approach where a codebook of feasible size is used for the 'most popular' code vectors on a small region of the lattice. Instead of using the lattice point as the representative a trained codebook vector is used. Less popular code vectors are quantized to nearest lattice points and coded by a Voronoi code. 10, 11, 12, 13
11, 14
E8
15
E8
16
17, 18
15
18
3 SIMULATION RESULTS
Fig. 5 shows the spatiotemporal resolution pyramid we used as basis for the applications described in the following sections. As can be seen, layers 2 and 1 roughly correspond to QCIF and CIF resolution, respectively. The highest resolution layer contains an interlaced video signal in its full spatial and temporal resolution according to the ITU-R 601-4 standard. On the right spatial and temporal resolutions of all layers are listed. Typical bit rates with increasing spatiotemporal layer resolution are 64 kbit/s, 330 kbit/s, 2.0 Mbit/s and 4.7 Mbit/s. 19
20
I
P
B
B
B
B
P
P
B
B
B
B
P
P
B
B
B
B
I
Layer 3: 88x60 3.33 Hz Layer 2: 176x120 10 Hz Layer 1: 352x240 60 Hz Layer 0: 704x240 60 Hz P
Figure 5: Spatiotemporal pyramid setup Similar to MPEG, our coding scheme allows three dierent picture types namely I-, P- and B-pictures. I-pictures are coded without reference to other frames. P-pictures can take reference to frames in the past whereas B-pictures can reference frames in the past as well as in the future. Within B-pictures it is also possible to form a temporal prediction by bi-directional interpolation between a past and a future reference frame. The picture type always remains xed through all spatial resolution layers. The group of pictures structure used for temporal prediction is shown at the bottom of Fig. 5. Motion vectors for multiscale motion compensation are obtained by hierarchical motion estimation. Motion compensation is applied according to the closed-loop codec shown in Fig. 3 (extended to four layers). No automatic bitrate control is used. Quantizers are adjusted manually for the dierent picture types at the start of the sequence to yield comparable picture quality within each resolution layer. 21, 22, 23
5
3.1 Coding performance and software-only decoding
Fig. 6 shows the performance of the codec for four test sequences. For each sequence PSNR values for decoding at four dierent bitrates corresponding to the four resolution layers are shown. PSNR for the original image resolution was measured after interpolation of the decoded layer to the nest resolution layer by the [1 4 3 4 3 4 1 4] lter. =
=
=
=
PSNR[dB] 30.00
Football Flowergarden
25.00
Tabletennis
Mobile & Calendar 20.00
15.00 0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
Bitrate (Luminance) [MBit/s]
Figure 6: Performance of Codec
Decoded frame rate Real-time frame rate
4.7 MBit/s
2.0 MBit/s
330 KBit/s
64 KBit/s
The not fully optimized software-only implementation of the decoder is able to decode and display layers 3 and 2 in real-time on a Sun SparcStation5 as well as on an SGI Indigo2 (Fig. 7). Decoding of the full resolution layer would need a ten times faster CPU. It can be seen that required CPU power is roughly proportional to the decoded bitrate.
10 1
Real-time
0.1
Sparc 5 Indigo2 L3 L2 Base QCIF
L1 CIF
L0 CCIR 601
Figure 7: Real-time performance of the decoder
3.2 Digital TV broadcasting
To achieve graceful degradation in high bit error rate environments like digital TV broadcasting we combined the scalable source coder with an unequal error protection scheme (UEP) by using rate-compatible punctured convolutional codes (RCPC codes) of dierent code rates. We rst encoded 5 seconds of the Flowergarden test sequence with our scalable coder. According to the importance of the data, coderates as shown in Table 1 were chosen. The entry means that input bits are coded with output bits. As more output bits are generated for one input bit, error protection increases. We compare this unequal error protection scheme with an equal error protection (EEP) where all packets are protected with a coderate of 4 5. In both cases the increase in bit rate is 26% due to error protection. Simulation results for ve dierent bit error rates on a binary 24
K=N
K
N
=
6
Picture type I-Picture P-Picture B-Picture
Coderates Header Layer 3 Layer 2 Layer 1 Layer 0 1/2 1/2 4/7 2/3 1/1 2/3 2/3 4/5 1/1 2/3 2/3 1/1
Table 1: RCPC Coderates used for unequal error protection symmetric channel (BSC) can be seen in Fig. 8. PSNR (dB) 30
Flowergarden
Unequal Error Protection 20
10
Equal Error Protection
Decoding impossible 0
10-4 10-3 Bit error rate
5*10-3 10-2
Figure 8: Comparison between UEP and EEP Average PSNR over the whole sequence is compared to bit error rate. As expected unequal error protection performs better as bit error rate increases. At a bit error rate of 10? decoding of the equal error protected bit-stream becomes impossible while unequal error protection still results in a 20 dB PSNR. The bit-stream at the output of the channel decoder may still contain corrupted bits which cannot easily be detected. Residual bit errors within a slice are detected by illegal code words or at synchronization marks. Error concealment replaces the whole slice by decoded information taken either from the lower resolution layer or, if in the base layer, from the corresponding slice of the preceding frame. 2
4 CONCLUSION
We presented a scalable video codec based on a spatiotemporal pyramid. Low decoder complexity is achieved by simple lters and -lattice vector quantization. It was veri ed that closed-loop pyramid coders are less sensitive to sub-optimal bit allocations which makes them more suitable for real applications than open-loop coders. Two dierent multiscale motion compensation approaches have been investigated. Interestingly, at lower bitrates the less complex lowpass compensation performs better than bandpass compensation. Hierarchical motion estimation oers the best compromise in terms of compensation gain, motion vector encoding costs, and encoder complexity. Our experiments concerning computation limited decoding showed for the chosen simulation model that CPU power required for decoding is roughly proportional to bitrate. The digital TV broadcast scenario showed how scalability can be eciently combined with unequal error protection to yield graceful degradation over channels with varying quality. E8
7
5 ACKNOWLEDGEMENT
The work of Niko Farber in implementing the RCPC encoding and decoding routines for the digital TV broadcasting simulations is gratefully acknowledged.
REFERENCES
1. A.N. Akansu and R.A. Haddad. Multiresolution Signal Decomposition. Academic Press, San Diego, 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24.
1992. J.W. Woods (ed.). Subband Image Coding. Kluwer Academic Publishers, Boston, 1991. J. Katto and Y. Yasuda. Performance evaluation of subband coding and optimization of its lter coecients. Journal of Visual Communication and Image Representation, 2(4):303{313, December 1991. Y. Shoam and Allen Gersho. Ecient bit allocation for an arbitrary set of quantizers. IEEE Trans. on Acoustics, Speech, and Signal Processing, 36(9):1445{1453, Sep. 1988. K. Ramchandran, A. Ortega, and M. Vetterli. Bit allocation for dependent quantization with applications to multiresolution and MPEG video coders. IEEE Trans. on Signal Processing, 3(5):533{545, Sep. 1994. B. Girod. Motion-compensating prediction with fractional-pel accuracy. IEEE Trans. on Communications, 41(4):604{612, Apr. 1993. K. Jensen and D. Anastassiou. Spatial resolution enhancement of images using nonlinear interpolation. In Proc. ICASSP 90, pages 2045{2048. IEEE, 1990. B. Chupeau. Estimation and distribution of motion information. Signal Processing: Image Communication, 5(5-6):539{552, Dec. 1993. T. Hanamura, W. Kemeyama, and Tominaga H. Hierarchical coding scheme of video signal with scalability and compatibility. Signal Processing: Image Communication, 5(1-2):159{184, Feb. 1993. P.A. Chou, T. Lookabaugh, and R.M. Gray. Entropy-constrained vector quantization. IEEE Trans. on Acoustics, Speech, and Signal Processing, 37(1):31{42, Jan. 1989. A. Gersho and R. Gray. Vector Quantization and Signal Compression. Kluwer, 1991. Y. Linde, A. Buzo, and R.M. Gray. An algorithm for vector quantizer design. IEEE Trans. on Communications, COM-28(1):84{95, Jan. 1980. N.M. Nasrabadi and R.A. King. Image coding using vector quantization: A review. IEEE Trans. on Communications, 36(8):957{971, Aug. 1988. D.G. Jeong and J.D. Gibson. Lattice vector quantization for image coding. In Proc. ICASSP'89. IEEE, 1989. T. Senoo and B. Girod. Vector quantization for entropy coding of image subbands. IEEE Trans. on Image Processing, 1(4):526{532, Oct. 1992. J.H. Conway and N.J.A. Sloane. Sphere Packings, Lattices and Groups. Springer, 1988. J.H. Conway and N.J.A. Sloane. Fast quantizing and decoding algorithms for lattice quantizers and codes. IEEE Trans. on Information Theory, IT-28(2):227{232, Marv. 1982. J.H. Conway and N.J.A. Sloane. A fast encoding method for lattice codes and quantizers. IEEE Trans. on Information Theory, IT-29(6):820{824, Nov. 1983. U. Horn and B. Girod. Pyramid coding using lattice vector quantization for scalable video applications. In Proc. PCS '94, Sacramento, CA, 1994. ITU-R. Encoding parameters of digital television for studios. Recommendation 601-4, 1982. ISO/IEC. Document 13818-2, Generic Coding of Moving Pictures and assciated Audio, Part 2: Video, Recommendation H.262, Draft International Standard, Marz 1994. Didier J. LeGall. MPEG: A video compression standard for multimedia applications. Comm. ACM, (34):46{58, 1991. Didier J. LeGall. The MPEG video compression algorithm. Signal Processing: Image Communication, 4(2):129{140, Apr. 1992. J. Hagenauer. Rate-compatible punctured convolutional codes (RCPC codes) and their applications. IEEE Trans. on Communications, COM-36:389{400, Apr. 1988.
8