technique using the Hamming window, the coding efficiency can be further improved by a maximum of 0.17dB. Index Termsâ Motion estimation, video coding,.
WINDOWING TECHNIQUE FOR THE DCT BASED RETINEX ALGORITHM TO HANDLE VIDEOS WITH BRIGHTNESS VARIATIONS CODED USING THE H.264 Hoi-Kok Cheung+, Wan-Chi Siu*, Dagan Feng*+ and Zhiyong Wang+ +
*Centre for Multimedia Signal Processing Department of Electronic and Information Engineering The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong
School of Information Technologies, J12 The University of Sydney NSW 2006 Australia
ABSTRACT Conventional block based motion estimators assume constant inter-frame object brightness. Pixel discrepancy is resulted primarily from motion factor without considering the influence of brightness changes. In this paper, we propose a simple and efficient windowing technique using the Hamming window and integrate it to our previously proposed algorithm. The algorithm is based on retinex approach using the DCT technique and designed to handle brightness variations. The new technique manages to greatly reduce the influence of ripple effect and further increase the compression efficiency without adding any extra overhead bits to the bit-stream. We applied the scheme to H.264 for testing. Experimental results show that the retinex based approach is an effective technique to handle inter-frame brightness variations and outperforms the H.264 system with weighted prediction enabled for sequence involving brightness variations. With our proposed windowing technique using the Hamming window, the coding efficiency can be further improved by a maximum of 0.17dB. Index Terms— Motion estimation, video coding, motion compensation, discrete cosine transform, H.264 1. INTRODUCTION Block based motion estimation and compensation[1] plays an indispensable role in the latest video compression standard, the H.264, for its attractive high compression efficiency. A number of features are added to the H.264 contributing greatly to its success including variable blocksize, weighted prediction[2], rate distortion optimization[3] and multiple reference frame motion compensation (MRMC)[4,5]. To handle video sequences having interframe brightness variations, weighted prediction technique is introduced to raise the prediction quality. However, this feature is only effective for varying brightness of spatially uniform light source. The primary target application is the fade in/out and gradual scene change effects. In [6], we proposed an illumination independent block division motion estimator and applied it to our proposed sprite coding application, which can efficiently handle brightness variability of the background scene. Barrow and
978-1-4244-1764-3/08/$25.00 ©2008 IEEE
2860
Tenenbaum[7] proposed to model the observed image I(x,t) as a product of two components: a reflectance image R(x,t) and an illumination image L(x,t), which describe the physical surface reflectance properties of the objects and the lighting condition respectively. I(x,t) = R(x,t)L(x,t) (1) The decomposition is a classical ill-posed problem. Further assumption has to be imposed to make it practical. Land et al.[8] proposed the Retinex Theory which imposes the spatial smoothness assumption on the illumination image L(x,t). In this paper, we enhance our previous work[9], which uses DCT techniques to estimate L(x,t) for video compression application, and propose a simple and efficient new windowing technique to further improve the compression efficiency without adding extra overhead bits to the bitstream. 2. REVIEW OF OUR PREVIOUS WORK In this section, we briefly describe our previous work[9]. Instead of using the conventional Gaussian based retinex system, we use the DCT techniques to estimate the illumination image L(x,t). Ld(x,t) = IDCT( Clip( DCT(I(x,t) ) ) (2) where Clip() is a function to preserve the first (NL+1)NL/2 low frequency coefficients in zigzag scan order with NL standing for the number of levels. This method is entitled as the rectangular window based retinex approach. The DCT coefficients are quantized and de-quantized with a quantization factor DCTQ before inverse DCT and the quantized coefficients are coded and sent to the decoder as the overhead bits for coding Ld(x,t). The corresponding retinex output is (3) Rd(x,t) = log I(x,t) – log Ld(x,t) Subsequently, the floating point values of Rd(x,t) from –K to K are then mapped (with upper and lower clipping) and quantized to integers ranging from 0 to 255. We refer the mapped image as the scaled retinex image. Fig. 1(a) and (c) show the block diagrams for transforming the image to and from the scaled retinex image respectively. We implemented the scheme and applied it to the MRMC environment of H.264. Figs. 1 (b) and (d) show the encoding and decoding
ICIP 2008
system diagrams for P slices respectively. The current frame and the list of reference frames are stored in both the pixel domain and the scaled retinex domain. Each macroblock can be coded either in the conventional pixel domain or the scaled retinex domain with independent motion estimation. Mode decision is made based on the mode giving the lowest rate-distortion cost. During the decoding stage, each macroblock is motion and error compensated in either the pixel domain or the scaled retinex domain decided upon the pixel/retinex mode selection bit. If the macroblock is coded in the scaled retinex domain, the inverse transformation procedure is applied to restore to the pixel domain. In summary, in addition to the conventional data, we need to transmit the bits for coding Ld(x,t) for each frame and the pixel/retinex mode selection bits for each macroblock. Current image
Low passed filtering Image encoding & decoding Decoded Ld(x,t) Log(I(x,t)/Ld(x,t))
Scaled Retinex domain block based ME, MC and prediction error coding
Ref 1
Ref 1 in scaled Retinex domain
Ref 2
Ref 2 in scaled Retinex domain
Ref 3
Ref 3 in scaled Retinex domain
no
Transform to pixel domain
Pixel mode?
Ld(x,t) bitstream for transmission
Yes Decoded current image
(a) Forward transformation Ld(x,t) bitstream
Inverse process
Image decoding
MB multiplication
Decoded cur Ld(x,t)
(b) Encoder Motion vectors, mode selection and error coding bitstream
Block based motion compensation and error compensation
Decoded MB in pixel domain
Pixel mode?
(c) Inverse transformation
Motion vectors, mode selection and error coding bitstream
Cur Ld(x,t) bitstream Decoded Reference List 0 Ref 1
Ref 1 in scaled Retinex domain
Ref 2
Ref 2 in scaled Retinex domain
Ref 3
Ref 3 in scaled Retinex domain ...
Decoded MB in scaled retinex domain
no
(c) Scaled retinex (b) Scaled retinex image – Hamming image - rectangular window (NL=9 EL=5) window (NL=9)
Fig. 2 Image of a sharp shadow illustrating the ripple effect and alleviation of the effect using Hamming window
Mode selection
Mapping Scaled cur retinex image
Pixel domain block based ME, MC and prediction error coding
(a) Pixel domain
Decoded Reference List 0
Cur image in scaled Retinex domain
...
Current image
consideration[9], we may assume the rear case and consider a smooth transition in that area for the estimated illumination image Ld(x,t). This decision gives rise to the halo artifact[10]. In addition, since the estimated illumination image Ld(x,t) is generated by a truncation of the DCT coefficients (see equation (2)), ripples arise close to the discontinuity. Fig. 2(b) shows an image with a sharp shadow in the scaled retinex domain having some white spots resulting from the ripple effect.
Transform to pixel domain
Yes Decoded current image
(d) Decoder
Fig. 1 Block diagrams of the image transformation and the overview video coding system 3. PROPOSED WINDOWING TECHNIQUE The rectangular window based retinex algorithm is very efficient in removing the inter-frame differences resulting from brightness variations. However, the algorithm suffers from an adverse ripple effect for areas close to abrupt brightness discontinuity like a sharp shadow. Indeed, images having brightness discontinuity in spatial domain is an infringement of the assumption that illumination image L(x,t) is spatially low frequency in nature. The retinex system cannot differentiate the cases if the abrupt change is resulted by (i)brightness variations or (ii)difference in surface reflectance of two objects. For a simple
The ripple effect is particularly significant in the dark area of the image as the sensitivity of the ratio of two small numbers is particularly high. To alleviate this ripple effect, we propose to apply a 2D window to weigh the DCT coefficients so as to smooth out the discontinuity in the DCT domain. Note that the windowing process does not introduce extra overhead bit to the bitstream. Fig. 2(c) shows the resultant images in the scaled retinex domain after applying windowing using the Hamming window. Note that the influence of the ripple effect is significantly reduced. We tested the proposed technique with a number of commonly used windows, including Blackman, Hamming and Hanning windows. We found that the windowing approach manages to reduce the ripple effect. In addition, we found that if we use a larger window size (higher order of window), the advantage obtained is even higher. In summary, instead of using equation (2) to generate the estimated illumination image Ld(x,t) (rectangular window retinex approach), we propose to weigh the DCT coefficients with a window function w(x). (4) Ld(x,t) = IDCT( Clip( w(x)•DCT( I(x,t) ) ) ) where w(x) is a 2D window of M=2(NL+EL)th order and EL 0 is defined as the extra window level as shown in fig.3. For example, if w(x) is a Hamming window, we may write w(x) = hamming(x1)•hamming(x2) (5) (6) where hamming(n) = 0.54 - 0.46cos[2ʌ(n+NL+EL)/M] and x=(x1,x2)T and nЩ= NL+EL-1?. Fig. 3 shows an example with an image in DCT domain and we sample 4 levels of DCT coefficients for coding Ld(x,t). (I.e. NL=4 and (4+1)4/2=10 DCT coefficients are coded and sent as overhead bits.) We use a window of order 2(NL+EL) with EL=2 to weigh the DCT coefficients. Note that there are still 10 DCT coefficients needed to be coded in spite of the larger window size. Compared to the case using a window of order 2NL, the DCT coefficients in level 4
2861
(L4Cx for x=1 to 4) can avoid being scaled to values of too small, which is essentially reducing the equivalent value of NL and reduces the illumination normalization effect. Therefore, with a larger weighing window, i.e. EL > 0, we can reduce the coefficients discontinuity in the DCT domain to remove the ripples and as well avoid weighing the high frequency DCT coefficients to values of too small (avoid reducing the normalization effect). Experimental results showed that this can further increase the coding efficiency and the Hamming window generally gives the highest compression efficiency gain among the windows tested. Therefore, in this paper, we recommend to use the Hamming window. The Hamming window coefficients of order 2(NL+EL)=12
NL=4
EL=2
L1C1
L2C1
L3C3
L2C2
L3C2
L4C2
L3C1
L4C3
L4C1
L4C4
Fig. 3 Top left corner of an image in DCT domain. Setting: NL=4 EL=2 with the Hamming window 4. ILLUSTRATION USING 1D DATA In this section, we use 1D data to illustrate the formation of the ripple effect and the advantage of our proposed windowing technique. Suppose a high frequency signal is partially illuminated and a sharp transition of brightness level is formed as shown in Fig. 4(a).
ripple effect due to discontinuity in the DCT domain while the one using the Hamming window is generally free of ripple. Figs. 4(b) and (c) show the corresponding signals in the retinex domain, Rd(x,t). It shows that Rd(x,t) generated using the Hamming window does not expose to the adverse influence of the ripple effect while the one using rectangular window does (i.e. a low frequency ripple exists). Therefore, if the shadow is moving in time domain, the ripples will influence the motion estimation accuracy (in the scaled retinex domain) and give lower compression efficiency. 5. EXPERIMENTAL RESULTS To test the efficiency of the proposed technique, we implemented the scaled retinex system in H.264(JM12.2) with the proposed Hamming windowing feature. A number of real and synthetic sequences having various forms of inter-frame brightness variations are used for testing. In this paper, we just quote some typical results using two real sequences, SpotLightPanning and Moving Shadow in the CIF format (both have 40 frames). Generally, the two sequences have no inter-frame motion so that we can rule out the possibility of motion factor. Sequence SpotLightPanning is a sequence of a close object in front of a far background so that when a continuously-lit and moving torch is used to further illuminate a particular area, the close object is significantly brightened while the background objects are only slightly brightened. Thus, it is characterized by significant brightness variations in both spatial and time domains. For sequence MovingShadow, it is a sequence of a poster portion having a sharp moving shadow under sunshine (uniform light source). In addition, there is an effect of brightness change due to the camera automatic brightness adjustment. I.e. the dark area of Fig. 5(d) is brighter than that of Fig. 5(c).
I(x,t) Ld(x,t) for Rectangular Ld(x,t) for Hamming
x
(a) pixel domain Rectangular Rd(x,t)
Rd(x,t)
Hamming
x
(b) retinex domain
(a) SpotLightPanning #13
(b) SpotLightPanning #18
(c) MovingShadow #18
(d) MovingShadow #32
x
(c) retinex domain
Fig. 4 Testing signal in pixel domain and retinex domain To remove the illumination effect, we transform the signal to the DCT domain and preserve the first 12 low frequency DCT coefficients, i.e. NL=12 for 1D case. We use two windows to separately weigh the 12 coefficients, (Rectangular and Hamming windows with EL=3). Then, the two low-pass signals Ld(x,t) are created with inverse DCT techniques. Fig. 4(a) shows that the low-pass signal Ld(x,t) created using rectangular window is exposed to significant
Fig. 5 Selected original images From our previous experience[9], we arbitrarily set K=1.8 which gives the best coding efficiency. We test the improvement in compression efficiency for our proposed windowing technique compared to the rectangular window approach for different values of NL. All the coding systems
2862
were configured for using 1 reference frame for MRMC. Note that the scheme for using multiple reference frames feature has an extra advantage in coding as it allows the system to choose the frame having similar lighting condition as the current frame to be coded. However, this feature cannot help revealing the advantage of our proposed windowing technique. Fig.6 shows the rate-distortion(RD) graphs of our proposed system and the H.264 system with and without (denoted as H.264 WP and H.264 respectively) weighted prediction enabled are also included for comparison. Experimental results show that different sequences have different optimal choice of NL values. This means that the choice is “contents dependent” and cannot be known in prior. With an appropriate NL value chosen from a favorable range as shown in fig.6, significant improvement in compression efficiency can be obtained compared to H.264 system with weighted prediction enabled. The results in Fig. 6 also show that our proposed windowing technique manages to further improve the compression efficiency by a maximum increase of 0.17dB and 0.11dB (sequences SpotLightPanning and MovingShadow respectively). SpotLightPaning
PSNR (dB)
27.5
80
130
(a)
[5] S-E.Kim, J-K.Han and J-G.Kim, “An efficient scheme for motion estimation using multireference frames in H.264/AVC”, IEEE Transactions on Multimedia, Vol. 8, No.3, June 2006, pp.457-466.
MovingShadow
PSNR (dB)
NL=4 rect NL=4 EL=2 ham NL=6 rect NL=6 EL=3 ham N =9 rect L N =9 E =5 ham L L N =12 rect L N =12 E =6 ham L L N =15 rect L NL=15 EL=8 ham N =18 rect L NL=18 EL=9 ham
26.5
[6] H.K.Cheung, W.C.Siu, D.Feng and T.Cai, “New block-based motion estimation for sequences with brightness variation and its application to static sprite generation for video compression”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 18, No. 4, April 2008, pp.522-527
H.264 H.264 WP
26 130
180
[1] K.C.Hui, W.C.Siu and Y.L.Chan, “New adaptive partial distortion search using clustered pixel matching error characteristic”, IEEE Transactions on Image Processing, Vol. 14, No. 5, May 2005, pp.597-607.
[4] T.Wiegand, E.Steinbach and B.Girod, “Affine multipicture motion-compensated prediction”, IEEE Transactions On Circuits and Systems for Video Technology, Vol. 15, No. 2, Feb 2005, pp.197-209.
Bitrate (kbps)
27
8. REFERENCES
[3] E-H.Yang and X.Yu, “Rate distortion optimization for H.264 interframe coding: a general framework and algorithms”, IEEE Transactions on Image Processing, Vol. 16, No. 7, July 2007, pp.1774-1784.
H.264 H.264 WP
26.5 30
7. ACKNOWLEDGEMENT This research is supported by the NICTA (via The University of Sydney), ARC grants, and the Centre for Multimedia Signal Processing, (via project RGC-PolyU5268/06E), Hong Kong Polytechnic University.
[2] P.Yin, A.M.Tourapis and J.Boyce, “Localized weighted prediction for video coding”, Proceedings, IEEE Int. Symposium on Circuits and Systems (ISCAS 2005), Vol. 5, May 2005, pp.4365-4368.
NL=4 rect NL=4 EL=2 ham NL=6 rect NL=6 EL=3 ham N =9 rect L N =9 E =5 ham L L N =12 rect L N =12 E =6 ham L L N =15 rect L NL=15 EL=8 ham NL=18 rect NL=18 EL=9 ham
27
efficiency for sequences with inter-frame brightness variations. We have implemented the window based algorithm to the H.264 to test the advantage of the algorithm. Experimental results show that retinex based approach is an effective technique to handle inter-frame brightness variations and outperforms the H.264 system with weighted prediction enabled. With our proposed windowing technique using Hamming window (without extra overhead bit), the coding efficiency can be further improved by a maximum of 0.17dB.
[7] H.G.Barrow and J.M.Tenenbaum, “Recovering intrinsic scene characteristics from images”, Computer Vision Systems, Academic Press, 1978.
230
Bitrate (kbps)
(b)
[8] E.H.Land and J.J.McCann, “Lightness and retinex theory”, J. Opt. Soc. Am., 61(1), 1971, pp.1-11.
Fig. 6 RD graphs for sequences 6. CONCLUSION In this paper, we propose a simple and efficient windowing technique using the Hamming window and integrate it to the DCT based retinex algorithm. Without adding extra overhead to the bit-stream, our proposed technique manages to further improve the compression
[9] H.K.Cheung, W.C.Siu, D.Feng and Z.Wang, “Retinex based motion estimation for sequences with brightness variations and its application to H.264”, Proceedings, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2008), 30 March-4 April 2008, pp.1161-1164. [10] D.J.Jobson, Z.Rahman and G.A.Woodell, “Properties and performance of a center/surround retinex”, IEEE Transactions on Image Processing, Vol. 6, No. 3, March 1997, pp.451-462.
2863