optimization of low-delay wavelet video codecs - Semantic Scholar

1 downloads 0 Views 233KB Size Report
Professor John Woods for giving us the opportunity to use the software. In this paper, there are examined delays of: 0, 1, 3, 5, 7,. 9, 13 and 15 sampling interval ...
OPTIMIZATION OF LOW–DELAY WAVELET VIDEO CODECS Andrzej Popławski1, Marek Domański2 1

2

Uniwersity of Zielona Góra, Institute of Computer Engineering and Electronics, Poland Poznań University of Technology, Chair of Multimedia Telecommunications and Microelectronics, Poland ABSTRACT

The paper deals with lifting implementations of the motion-compensated temporal filter banks in the (t+2D) wavelet codecs. The paper is focused on the problem of optimum setup of the analysis and synthesis filters with the maximum coding efficiency possible for a given delay constrain. Systematic exhaustive experiments have been performed in order to find optimum setup as well as coding efficiency for a given delay constrain. Index Terms – Wavelet video coding, low-delay video coding, motion compensation, lifting 1. INTRODUCTION Contemporary wavelet video coders mostly exploit threedimensional (3-D) wavelet (subband) analysis, i.e. twodimensional spatial analysis and temporal analysis. The latter usually employs motion-compensated temporal filtering (MCTF) [1-5] that yields significant delays related to wavelet analysis and synthesis in the time domain [6]. The value of delay grows with a growing number of levels of temporal decomposition as well as with a growing order of temporal filters. In wavelet video coding, dyadic temporal decomposition is mostly applied, therefore few levels, usually 3 or 4, are used in order to obtain high compression efficiency. It has been demonstrated that application of longer LeGall 5/3 filters also results in higher compression efficiency as compared to that obtainable for short Haar filters [7]. On the other hand, low values of coding delay are of great importance in all interactive applications such as videotelephony, videoconferencing, remote surveillance systems, online education, etc. Therefore there exists an important question about the tradeoff between high compression efficiency and low end-toend delay [10,15]. Lifting implementation of the temporal motioncompensated wavelet analysis and synthesis [8,9] has been widely adopted in video coding. In that context, in temporal analysis and synthesis filter banks, it has been proposed to use LeGall 5/3 filters without the update step in order to reduce the coding delay. It has been demonstrated that such a simplification implies losses of coding efficiency that are often acceptable [11,12]. Nevertheless, there exists a question about the optimum setup of filters on individual

levels of decomposition. For a given decomposition level, we have a choice between Haar filters, 5/3 filters and filters without the update or prediction steps. Unfortunately, there is lack of an exhaustive study of the problem in the references. Here, we deal with the problem of the optimum setup of the analysis and synthesis motion-compensated lifting filter banks in the (t+2D) wavelet video codecs, i.e. wavelet encoders that perform the temporal analysis before the spatial two-dimensional analysis.. We seek for the optimum in the sense of the maximum compression efficiency possible for a given delay constrain. We are going to answer the question about the choice of the filters for individual levels of decomposition. This question will be answered for the systems with adaptive filter banks where the number of decomposition levels is subject of the local rate-distortion (R-D) optimization. Therefore we assume a fixed structure of filters at each decomposition level but we keep in mind that the actual number of decomposition levels depends on the results of the local R-D optimization. In this paper, we focus our attention on the dominant delay resulting from filtering in the time domain, thus omitting processing and transmission delays. We consider the entire end-to-end delay that appears at the encoder site (encoding delay) as well as at the decoder site (decoding delay) and call them in the paper as total coding/decoding delay (TCDD). The value of TCDD is a sum of delays in the encoder and the decoder that occur due to prediction and update between various images in a video sequence. Further in the paper, there are some examples of TCDD calculations. 2. MULTILEVEL TEMPORAL ANALYSIS Let us denote: xt – input frame in time t, ht – highpass frame in time t, lt – lowpass frame in time t, MCt(x2t) – motion compensation operator between frames x2t+1 and x2t, MCt(x2t+2) – motion compensation operator between frames x2t+1 and x2t+2. Then, for one level of temporal filtering for Haar filters there is (see Fig. 1):

ht = x2t +1 − MCt ( x2t ) , lt = x2t + 1 2 MCt−1 ( x2t ) .

(1)

For three levels of temporal decomposition, from Fig. 1, it is easy to find that the encoding delay is equal to 7 sampling periods in the time domain. In the decoder, the whole group

of 8 frames may be reconstructed instantaneously. Therefore, TCDD = 7. Similarly, for 4 decomposition levels, there is TCDD = 15. time x0

x1

x2

x3

h01

x4

x5

h11

l10

x6

h21

l11

input video sequence

x7 h31

l12

k=1

l 31

h02

h12

l20

k=2

l21 h03

k=3

l30

3. DESCRIPTION OF EXPERIMENTS

prediction step update step

Fig. 1. Three levels of temporal wavelet analysis using Haar filters.

Similarly, for a single level of temporal analysis with LeGall 5/3 filters, there are (see Fig. 2): ht = x2t +1 − 1 2 ( MCt ( x2t ) + MCt ( x2t +2 )) , (2) lt = x2t + 1 4 ( MCt−−11 ( x2 t + 2 ) + MCt−1 ( x2t )) . time x0

x1

x2

h01 l10

x3

x4

h11

x5

x6

h21

l11

l12

h02

x7

input video

x8 sequence

h31 l 31

k=1 l14

h12 k=2

l20

l21

l22

h03

1

k=3 l30

prediction step update step

l31

Fig. 2. Three levels of temporal wavelet analysis using LeGall 5/3 filters.

In conclusion, the total coding/decoding delay (TCDD) may be expressed as multiplies of the sampling period in the temporal domain [6,12] K

TCDD =

∑2

k −1

,

(3)

k =1

for Haar filter banks and TCDD = 3·(2K-1) ,

for 5/3 filter banks, where K is number of temporal decomposition levels. Multi-level high-order efficient temporal filtering yields large coding delays that are often unacceptable. Delay reduction is a subject of wide research interests, and there were already proposed some solutions [10,14] that are mainly based on reduction of the prediction and the update steps in the lifting implementations of the dyadic filter banks. Obviously, such reductions lead to some coding efficiency decrease. These effects have been studied in the references but the research results are not fully comparable as they are obtained using various test video sequences and different versions of codecs. Here, we are going to study the problem in the systematic way using extensive and well planned experiments.

(4)

In the references, modified temporal filtering schemes have been proposed. In these schemes, reduction of coding delay is achieved by skipping the prediction or the update steps and breaking the reference to future frames of the sequence. Some of these schemes never lead to competitive solutions. Some others (like 5/3 filters with monodirectional reference to previous frames) may be considered as equivalent to other schemes considered. The remaining were considered in the optimization, and we use the following notations for the individual schemes: • H – original Haar filter bank, • U – modified Haar filter bank without update step, • B – modified Haar filter bank with update step related to a past frame, • 5 – original 5/3 filter bank, • 3 – modified 5/3 filter bank without update step related to a future frame. Note that modified LeGall 5/3 filters without both prediction and update from future frames are equivalent to Haar filters with update related to past frames. In this paper, 4 levels of temporal dyadic wavelet analysis are assumed as they assure the best compression efficiency for most of test sequences and examined spatial and temporal resolutions as well as transmission bitrates. We use four-symbol names to denote the coding schemes at individual decomposition levels of the temporal analysis and synthesis, e.g. the symbol 53HU means that following schemes are used for temporal filtering: • 5 – at first temporal decomposition, • 3 – at second temporal decomposition, • H – at third temporal decomposition, • U – at fourth temporal decomposition. For experiments, there were used 9 test sequences (City, Crew, Harbour, Ice, Soccer, Football, Silent, Mobile, Foreman) in formats: CIF 15Hz (352×288), CIF 30Hz (352×288), 4CIF 30Hz (704×576) and of length of 6.4 seconds. All experiments were performed for 10 bitrates

equally spaced in the range 768 – 4224 kbps for 4CIF 30Hz, 256 – 1408 kbps for CIF 30Hz, and 128 – 704 kbps for CIF 15 Hz. Motion vectors were estimated with ¼th pixel accuracy. In the experiments, the MC-EZBC codec was used with its version from 2005 and with the improvements of [15,16]. The authors acknowledge their sincere thanks to Professor John Woods for giving us the opportunity to use the software. In this paper, there are examined delays of: 0, 1, 3, 5, 7, 9, 13 and 15 sampling interval (frames) in the time domain. It is possible to use different variants of temporal wavelet filtering for a given value of delay. For example, for a delay of 9 pictures, there can be applied 533U, 53HU and 55UU variants. The main goal of the research was to find the scheme of the temporal wavelet analysis and synthesis that is the most efficient in the sense of the compression performance for a given value of the total coding and decoding delay (TCDD). Moreover the aim of the research was to determine the unavoidable losses in the compression performance due to various TCDD constrains.

Table 1. The optimum filterbank variants for assumed delays. The less Schemes with The most TCD efficient lower efficient delay schemes efficiency schemes 0 UUUU UBBB BBBB 1 3UUU HUUU – 3 33UU 5UUU HHUU 5 53UU 5HUU – 7 333U 33HU HHHU 9 533U 53HU 55UU 13 553U 55HU – 15 3333 33HH HHHH

Application of the modified filtering schemes yields reduction of coding delay but has also significant influence on compression efficiency. Figs. 3-5 present the averaged results for luminance PNSR decrease (for 10 examined bitrates) for the best filter structure variant for a given value of the total coding/decoding delay (TCDD) as compared to the 5555 filter banks without coding delay restrictions. 0,0 -0,5

-1,5 -2,0 -2,5

CIF 15Hz

-3,0

City Harbour Soccer Silent Foreman

-3,5 -4,0

Crew Ice Football Mobile

-4,5 15

13

9 7 5 3 delay [number of pictures]

1

0

Fig. 3. Average PSNR decrease in comparison with the solution with zero delay constrains for resolution CIF 15 Hz . 0,0 -0,5 -1,0 PSNR [dB]

In the experiments [17], the test sequences have been encoded and then decoded using all specified variants of temporal filtering, and for the whole assumed set of bitrates. Each decoded sequence was compared to the original one and the average luminance PSNR was calculated for all decoded frames. Table 1 presents the comparison of compression efficiency for the temporal filtering schemes mentioned above. The most efficient variants are collected in the second column in Table 1 and the less effective in the last column (variants were compared regarding average PSNR over all sequences used). From the experimental results (Table 1), the general rule may be deduced that says that the update step reductions should be made possibly on the higher levels of decomposition. Such a rule is expectable as the first decompositions work on all pixels and higher levels are locally switched off by the R-D optimization. There were reported relatively small compression efficiency differences of variants: 53UU and 5HUU (5 frames delay), 333U and 33HU (7 frames delay), 533U and 53HU (9 frames delay), 553U and 55HU (13 frames delay). The variants with shorter delays and lower coding efficiency exhibit also lower computational complexity (there are fewer motion vectors to estimate). The smallest average PSNR decrease (for ten transmission speeds) has been observed for Football sequence (CIF 30Hz) – 0,79 dB for the zero delay constrain. The highest average decrease was observed for Mobile sequence (4,33 dB) for the zero delay constrain (Figs. 3 - 5).

PSNR [dB]

-1,0

4. EXPERIMENTAL RESULTS

-1,5 -2,0 CIF 30Hz

-2,5 -3,0

City Harbour Soccer Silent Foreman

-3,5 -4,0

Crew Ice Football Mobile

-4,5 15

13

9 7 5 3 delay [number of pictures]

1

0

Fig. 4. Average PSNR decrease in comparison to the scheme with no delay constrain (5555) for resolution CIF 30 Hz .

0,0

5. CONCLUSIONS

-0,5

The paper presents a systematic comparison of various low-delay temporal filtering schemes under the same test conditions. Extensive set of experimental results may be found on the webpage [18]. The paper presents a number of schemes that exhibit the highest compression efficiency for the assumed total coding and decoding delay constrains. Based on the results of the extremely exhaustive experiments with usage of 9 test video sequences, the most efficient filtering schemes have been selected for given coding-delay constrains (Table 1). These proposed schemes exhibit the highest average compression efficiency for the 9 test sequences and 10 bitrates considered.

PSNR [dB]

-1,0 -1,5 -2,0 -2,5

4CIF 30Hz

-3,0

City Harbour Soccer

-3,5

Crew Ice

-4,0 15

13

9 7 5 3 delay [number of pictures]

1

0

Fig. 5. An average PSNR decrease in comparison with the solution with no delay constrain (5555) for resolution 4CIF 30 Hz.

CIF 30Hz 4CIF 30Hz

The results may be presented also from another point of view. For the constant luminance PSNR value we calculated an average necessary increase of bitrate that assures the same values of PSNR as a function of the coding delay constrains (Fig. 6). As a reference, there was used the variant 5555 with the highest compression performance but long delays. 15 13

6. REFERENCES [1] [2] [3] [4] [5]

9 7

[6]

5

CIF 15Hz

3

[7]

1

[8]

110%

100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%

0

Fig. 6. Relative increase of bitrate due to imposed delay constrain calculated for each TCDD constrain for the respective most efficient filter setup relative to the reference scheme without delay constrain (5555 variant).

[9]

Obviously, for constant video quality (PSNR), constraining the coding delay leads to an increase of the bitrate. For the most critical delay constrain set to 0, the most efficient filtering scheme UUUU (with degraded Haar filters) leads to the increases of the bitrate ranging from about 70% (for CIF resolution) to more than 100% for 4CIF resolution. Already allowed TCDD = 1 reduces the bitrate penalty to 50 – 70 % in average. For individual video sequences, the bitrate penalty varies according to video content. This bitrate penalty is higher for sequences with significant translational motion with moderate velocity (see City and Harbour sequences). For video sequences with fast and complex motion (e.g. Football and Crew), the bitrate penalty is lower because motion compensation is less efficient anyway.

[11]

[10]

[12] [13] [14] [15] [16] [17]

J.-R. Ohm, “Temporal domain subband video coding with motion compensation”, Proc. IEEE Int. Conf. Acoustics, Speech Signal Proc., vol. 3, San Francisco, pp. 229–232, 1992. J.-R. Ohm, “Three-dimensional subband coding with motion compensation”, IEEE Trans. Image Proc., vol. 3, pp. 559–571, 1994. D.Taubman; A.Zakhor, “Multirate 3-D subband coding of video”, IEEE Trans. Image Proc., vol. 3, pp. 572-588, 1994. S. J. Choi, J. W. Woods, “Motion-compensated 3-D subband coding of video”, IEEE Trans. Image Proc., vol. 8, pp. 155 167, 1999. S.-T. Hsiang, J. W. Woods, “Embedded video coding using invertible motion compensated 3-D subband/wavelet filter bank”, Signal Processing: Image Communication, vol. 16, pp. 705–724, 2001. J.-R. Ohm, “Complexity and delay analysis of MCTF interframe wavelet structures”, ISO/IEC JTC1/SC29/WG11, Doc. M8520, MPEG 2002. G. Pau, Ch. Tiller, B. Pesquet-Popescu, H. Heijmans, “Motion compensation and scalability in lifting-based video coding”, Signal Processing: Image Communication, vol. 19, pp. 577-600, 2004. A. Secker, D. Taubman, “Motion-compensated highly scalable video compression using an adaptive 3D wavelet transform based on lifting,” IEEE Int. Conf. Image Proc., Thessaloniki, Greece, pp. 1029–1032, 2001. B. Pesquet-Popescu, V. Bottreau, “Three-dimensional lifting schemes for motion compensated video compression”, IEEE Int. Conf. Acoustics, Speech Signal Proc., Salt Lake City, 2001. L. M. Huang, S. S. Mei, Y. Honda, “Results on Scalable Video Coding in Low Delay Mode”, ISO/IEC JTC1/SC29/WG11, Doc. M9843, MPEG 2003. G. Pau, B. Pesquet-Popescu, M. van der Schaar, J. Viéron, “DelayPerformance Trade-Offs in Motion-Compensated Scalable Subband Video Compression”, Proc. of Advanced Concepts for Intelligent Vision Systems (ACIVS), 2004. G. Pau, J. Viéron, B. Pesquet-Popescu, “Video coding with flexible MCTF structures for low end-to-end delay”, IEEE Int. Conf. Image Proc. Genova, vol. 3, pp. 241–244, 2005. V. Seran, L. P. Kondi, “3D based video coding in the overcomplete discrete wavelet transform domain with reduced delay requirements”, IEEE Int. Conf. Image Proc., Genova, vol. 3, pp. 233–236, 2005. Z. G. Li, K. P. Lim, X. Lin, S. Rahardja, “Customer Oriented Low Delay Scalable Video Coding”, ISO/IEC JTC1/SC29/WG11, Doc. M11544, MPEG 2004. T. Rusert, K. Hanke, M. Wien, “Optimization for locally adaptive MCTF based on 5/3 lifting”, in Proc. Picture Coding Symposium, 2004. T. Rusert, K. Hanke, P. Chen, J. Woods,”Recent improvements to MC-EZBC”, ISO/IEC JTC1/SC29/WG11 Doc. M9232, MPEG 2002. A. Popławski, Three-dimensional wavelet video compression with low delays”, PhD dissertation, Poznań 2006.

[18] Also, http://www.multimedia.edu.pl/publications/doctoral/index.htm