Email: {camilo.dorea,oscar.divorra,peng.yin,cristina.gomila}@thomson.net. ABSTRACT ... tion is responsive to signal characteristics, besides compression re-.
A DIRECTION-ADAPTIVE IN-LOOP DEARTIFACTING FILTER FOR VIDEO CODING ` Camilo C. Dorea, Oscar Divorra Escoda , Peng Yin and Cristina Gomila Thomson Corporate Research Princeton, New Jersey, 08540, USA Email: {camilo.dorea,oscar.divorra,peng.yin,cristina.gomila}@thomson.net ABSTRACT Recent video coding strategies, such as H.264/AVC [1], incorporate an in-loop deblocking filter in order to reduce the effects of quantization noise. These techniques are limited to treating blocky artifacts on smooth regions. In order to solve this, sparsity-based filtering techniques have been recently proposed for efficient filtering of edge and textured areas. More recently, direction-adaptive sparsitybased filtering has also allowed to exploit directional features on Intra encoded pictures. This paper proposes a high-performance in-loop filter for deartifacting Intra and Inter encoded video data. This work extends for Inter frames the use of direction-adaptive sparsity-based filtering techniques [2] and it improves performance by adaptively selecting filtering thresholds consistent with quantization noise statistics, local encoding conditions, compression requirements and the original signal. Thresholds are both spatially and temporally adapted to optimize video quality and/or coding cost. Selected thresholds are encoded and transmitted as side information to the decoder. Experimental results show significant bit rate savings and visual quality enhancement when compared to the state-of-theart H.264/AVC codec using in-loop deblocking filtering. Index Terms— Video Coding, In-Loop Filtering, H.264/AVC, Sparse Approximations, Redundant Transforms, Quincunx Sampling. 1. INTRODUCTION Many video coding strategies employ block-based transforms (DCTs) and motion compensation among their compression tools. Coarse quantization of transform coefficients and the use of different reference locations or different reference pictures by neighboring blocks in motion-compensated prediction can give rise to visual artifacts such as distortion around edges, textures or block discontinuities. Within the H.264/AVC video coding standard [1], an in-loop deblocking filter [3] has been adopted to attenuate artifacts arising along block boundaries. By adaptively applying low-pass filters to block edges, the deblocking filter can improve both subjective and objective video quality. The filter operates by performing an analysis of the samples around a block boundary and adapts filtering strength to attenuate small intensity differences attributable to quantization noise while preserving the generally larger intensity differences pertaining to the actual image content. Blocking artifacts removed by the H.264/AVC deblocking filter are not the only ones present in compressed video. Coarse quantization is also responsible for other artifacts such as ringing, edge distortion or texture corruption. The low-pass filter employed for deblocking assumes a smooth image model, and is not suited for denoising image singularities such as edges or textures. In order to overcome this limitation, recently, a denoising sparsity-based nonlinear in-loop filter has been proposed in the literature [4]. This filter adapts to different non-stationary image statistics by exploiting a
sparse image model that uses an overcomplete set of transforms and a thresholding operation [5]. The filter performs a weighted averaging of denoised estimates obtained from all possible translations of a given 2D orthonormal transform. In spite of its broad applicability, [4] presents some limitations. First, it uses a set of translated versions of the DCT which constrains the directions of analysis to the vertical and horizontal components. Second, usage of very correlated transforms for both deartifacting and residual coding purposes makes the denosing algorithm less robust to quantization noise. Those translations of the DCT in [4] that are significantly aligned with the residual coding transform in H.264/AVC can alter the sparseness measurements used to compute the combination weights of denoised estimates. [2] showed how these two aspects may permit a significant amount of artifacts to remain after filtering despite improvements in PSNR. Last, the temporal variability of quantization noise and signal statistics is not taken into account in [4]. The direction-adaptive deartifacting filter that we proposed in [2] addresses some of the above problems for Intra encoded pictures. By means of multiple sub-lattice samplings of the picture, it is able to exploit directional picture features for filtering purposes. Moreover, it analyzes the problems arising from the alignment between filtering and residual coding transforms. However, [2] does not address the use of direction-adaptive deartifacting on Inter encoded pictures. Indeed, it cannot accommodate different spatial characteristics of quantization noise arising from varying types of coding/prediction modes and signal content. Furthermore, neither [2] nor [4] account for the joint spatio-temporal variability of quantization noise statistics in the filtering process. This paper proposes an in-loop deartifacting filter using spatiotemporally adaptive thresholds for sparsity-based filtering on multiple sub-lattice picture samplings. Since filtering thresholds adaptation is responsive to signal characteristics, besides compression requirements and encoding modes, thresholds need to be transmitted to the decoder as side information. The use of side information greatly enhances the ability of the filter to act on Intra, Inter-P or Inter-B encoded content as it aids in adapting to the quantization noise statistics of each coding strategy. In this work, a specific implementation of our in-loop filter is provided for the H.264/AVC video codec. Results show coding efficiency gains of up to 10% (or 0.9dB) in sequences of high motion content with difficult prediction. In the following, a review of multi-lattice sparsity-based filtering is presented in Section 2. These techniques are then extended to inloop Inter frames deartifacting in Section 3. Results and conclusions are provided in Sections 4 and 5, respectively. 2. DIRECTION-ADAPTIVE DEARTIFACTING Sparsity-based filtering exploits the capacity of some transforms to provide sparse signal decompositions. Based on this assumption,
Q ui nc un x sa m pl in g ax es Rectangular sampling axes
Fig. 1. Quincunx sampling of a rectangular grid. Black and white dots represent each of the diagonally arranged complementary cosets. additive noise may be removed from the signal by thresholding those transform coefficients of small magnitude. The denoising approach of [4] proposes the use of an overcomplete set of transforms Hi generated by using all possible translations i ∈ {0, 1, ..., M − 1} of a given 2D orthonormal transform H, such as the DCT. Then, for every Hi a projected version of the decoded picture I ′ is obtained as cIi′ = Hi I ′ . The denoising operation sets all coefficients in cIi′ lying below a certain threshold to zero in cˆIi′ . Denoised estimates are given by H−1 ˆIi′ . i c In overcomplete settings, it is expected that some of the denoised estimates will provide better performance than others. The final filtered version Iˆ can benefit from a combination of such denoised estimates. In [5], a weighted averaging approach is presented such that: M −1 X Iˆ = ˆIi′ . (1) Wi · H−1 i c
Fig. 2. Block diagram of direction-adaptive deartifacting in-loop filter at the encoder side. over the original grid. Denoised estimates originating from each of the multiple lattices are then combined through a weighted combination [2]: M/2−1
Iˆ =
X
ˆIj′ + Wj · H−1 j c
j=0
M −1 X k=0
“ ” −1 qx2 1 Wkqx · H−1 ˆqx ⊕ H c ˆ ′ ′ k c k I I k
k
1 2 where cˆqx and cˆqx represent the denoised transform coefficients I′ I′ k
k
in each of the quincunx cosets, and Wkqx = Wkqx1 ⊕ Wkqx2 are the weights obtained from the sparsity measures in the quincunx grid. In the equation, ⊕ represents the merging operation of both cosets (i.e. PM ′ −1 Wj (x, y)+ rotation and upsampling) and weights must satisfy j=0 PM −1 qx k=0 Wk (x, y) = 1 ∀ (x, y). 3. DIRECTION-ADAPTIVE IN-LOOP FILTER FOR INTRA & INTER FRAMES
One of the main advantages of in-loop filtering is the possibility to use filtered reference frames for motion estimation and compensation. Nevertheless, one needs to take into account that picture areas previously filtered in reference frames may not need to be further fili=0 tered. In order to avoid possible overfiltering, an in-loop filter must The weights Wi are based on a measure of local sparsity and emphabe locally adaptive, taking into account the quantization noise statissize the best denoised estimates. Weights are inversely proportional tics associated to the different coding modes at the block level as to the number of non-zero denoised coefficients involved in the repwell as the pixel level [1]. Furthermore, temporal variations in sigresentation of pixel (x, y) [5]: nal characteristics may further influence the statistics of quantization ‚ ‚ ‚ ‚ noise. In the context of sparsity-based filtering, these considerations (2) cIi′ (x, y)‚ Wi (x, y) = C(x, y)/ ‚ˆ 0 can be reflected in a threshold selection and combination strategy, which is both spatially and temporally adaptable to signal characterwhere C(x,y) is a location dependent constant in order to guarantee PM −1 istics, compression requirements and prediction modes. Wi (x, y) = 1 ∀ (x, y) and cˆIi′ (x, y) represents the that i=0 In the in-loop filter described herein, direction-adaptive dearticoefficients in the transform domain that contribute to pixel (x, y). Direction-adaptiveness of the sparsity-based filter in [2] is achieved facting [2] is extended by using spatio-temporally adaptive sets of thresholds for localized filtering. Thresholds are generated and sepby applying translations Hi of the transform H on oriented subarately selected in order to best fit the statistics of quantization noise samplings of the image. The filtering on the quincunx sampling latof each image and/or image area. As outlined in the block diagram tice, depicted in Fig. 1, can extend the directions of analysis beyond of Fig. 2, a Filtering Map Creation module is responsible for identhe vertical and horizontal components. Denoised estimates resulttifying and isolating image areas which call for dedicated filtering ing from thresholding operations on the samples of the complemenstrengths. This module and the components related to threshold gentary quincunx sets can be combined with denoised estimates from eration, selection and result combination are detailed next. the original sampling grid. Several of the translated transforms used over an original sam3.1. Filtering Map Creation pling grid may overlap or nearly overlap with the transforms used in residue coding. In this case, it may occur that both quantization Visual artifacts are closely related to the amount of local quantinoise/artifact and signal fall within the same sub-space of basis funczation noise present in the encoded picture. In turn, quantization tions leading to an artificially large sparseness measure. In order to noise is dependent upon the applied quantization step and the amount avoid these pitfalls, [2] proposes the exclusion of those transforms of residual error resulting from block-based prediction. Depending aligned or mostly aligned (those with 1 pixel of misalignment or on the prediction mode used, and the source signal to encode, the less in the horizontal or vertical direction) to the transforms used amount of residual error varies. Hence, akin to H.264/AVC, one can in residue coding. Thus, at most M/2 translations of Hi are used estimate which prediction modes typically present higher or lower
Fig. 4. Block diagram of direction-adaptive deartifacting in-loop filter at the decoder side. 3.2. Threshold Optimization and Filtered Picture Construction
Fig. 3. Example of Filtering Map for Foreman sequence. Blue, red, light blue, yellow, green and gray correspond to classes 1 through 6, respectively. amounts of quantization noise. For instance, conditions for estimating the amount of noise include INTRA or INTER modes of prediction and the presence or absence of coded residuals for each block (or macroblock). Additionally, the boundaries of blocks associated to large amounts of quantization noise are also subject to blocking artifacts of varying severity. In this work, unlike in H.264/AVC, the boundaries of a block are understood as the set of pixels exclusively outside of the block which lie within a distance d of the block edges. Based on the filtering strength observations of the H.264/AVC deblocking filter [1], boundaries between two INTER predicted blocks with no coded residual which present motion related differences (block motion differences of more than 1 pixel or motion compensation from different reference frames) are also considered as susceptible to presenting artifacts. In this case, the boundary between blocks is taken as the set of pixels within a distance d that overlap either side of the common block edge. These conditions provide a localized discrimination of image areas intended for filtering operations of varying intensity. Each pixel of a luminance image is grouped into a particular class in accordance to the local encoding conditions summarized in Table 3.1. Note that a pixel may satisfy multiple conditions, in this case, the class assignment of lowest number takes precedence. This classification gives rise to a Filtering Map, such as illustrated in Fig. 3. Filtering Maps for the chroma components of an image are obtained via sub-sampling of the luminance maps. A different filtering threshold per frame can then be applied for each class. Block and boundary conditions INTRA encoded macroblock INTER encoded block with coding of residual outer boundaries of INTRA encoded macroblock outer boundaries of INTER encoded block with coding of residual boundary between INTER encoded blocks with no coded residual but with significant motion differences none of the above
Class 1 2 3 4 5
6
Table 1. Encoding modes and conditions for pixel class determination.
Filtered pixel values from each class are computed from an independent filtering step using our direction-adaptive sparsity-based filtering and a selected threshold. All thresholds within a predetermined range are tested. For each class, the Threshold Selection module (Fig. 2) uses a Distortion minimization procedure in order to determine the best filtered data and threshold per class of pixels. With the aid of the Filtering Map, the Threshold Selection module chooses for each class the threshold which maximizes the PSNR between filtered and original pixels. A composite image containing the optimally filtered data for each class is constructed and made available to the remainder of the coding modules. 3.3. Threshold Transmission and Decoding Optimal threshold selection requires the availability of original data. Hence, the selected thresholds for each class need to be transmitted in the bit-stream of the coding scheme. Threshold transmission represents a minor overhead. Considering that each threshold in our implementation may have at most 32 possible threshold values, only an average of 30 bits per frame (6 thresholds) are required if fixed length codes are used. In the H.264/AVC framework, threshold values can be inserted in the bit-stream in the slice and/or picture headers or as SEI (Supplemental Enhancement Information) data. The decoder must also construct a Filtering Map based on the decoded data. Together with the optimal threshold information extracted from the bit-stream, it can proceed to deartifact the pixels within each class by using the direction-adaptive deartifacting filter. The block diagram of the in-loop filter at the decoder side is pictured in Fig. 4. 4. EXPERIMENTAL RESULTS The proposed in-loop filter has been embedded within an H.264/AVC encoder/decoder. Results presented within this section use the 8x8 integer H.264/AVC transforms for denoising operations and a distance of d = 2 for the boundaries in Filtering Maps. Average efficiency gains are obtained according to [6] by using QP={22, 27, 32, 37} on High Profile and IBBP encoding. The set of possible values for the thresholds ranges from 0 through 25 integer values. All reported simulations are generated with the JMKTA reference software [7]. Table 4 presents the average PSNR gains and bitrate savings of the studied in-loop filter relative to the standard H.264/AVC deblocking filter for several test SD and HD sequences. Results indicate a significant improvement in PSNR and bitrate savings of up to 0.81dB and 14%, respectively. Filtering performance is particularly enhanced in sequences with poorer predictability. More precisely, images with moving edges and transparencies (Alphabet), moving
Avg. PSNR gain 0.425dB 0.819dB 0.415dB 0.774dB 0.071dB 0.174dB 0.222dB 0.185dB 0.162dB
Avg. rate savings 5.36% 11.36% 8.47% 14.04% 2.12% 5.21% 7.24% 5.30% 4.77%
Stefan 44
42
40 PSNR−Y [dB]
Sequence Alphabet SD Cheerleaders SD Football SD Stefan SD Big Ships HD City HD Crew HD Night HD Shuttle HD
Table 2. Average PSNR gains and bitrate savings of proposed filter relative to H.264 deblocking for various test sequences.
38
36
34 Direction−adaptive Deartifacting H.264 Deblocking
32
5. CONCLUSIONS This paper presents a high-performance sparsity-based in-loop deartifacting filter for Intra and Inter encoded video data. This includes spatio-temporal adaptation of denoising thresholds in order to best fit the quantization noise statistics of each picture and/or picture area. Experimental results show that our direction-adaptive in-loop deartifacting filter can provide superior coding efficiency and enhanced visual quality when compared to H.264/AVC with in-loop deblocking filter. Future work includes an extensive formal analysis of visual quality improvements and a study on fast implementations of the technique.
30 500
1000
1500
2000
2500 3000 3500 bitrate [kbps]
4000
4500
5000
5500
Crew 42 41 40 39 PSNR−Y [dB]
textures, or very complex motion with many objects and textures giving rise to a significant amount of coded residual (Cheerleaders, Football, Stefan). Rate-Distortion curves for Stefan and Crew are presented in Fig. 5. PSNR differences are most noticeable at higher bitrates where images have a higher content in texture and details. Visual quality is also improved by the proposed filter. For this purpose, crops of the Stefan sequence are shown in Fig. 6. Artifact reduction by the proposed filter can be appreciated particularly around the many edges of the image such as those formed by Stefan’s legs and arms. Moreover, such reduction is achieved while preserving oriented edges and texture. Note, for example, textures of the skin area and clothing. For proper comparisons images should be viewed on a screen.
38 37 36 35 Direction−adaptive Deartifacting H.264 Deblocking
34 33
0
2000
4000
6000 8000 bitrate [kbps]
10000
12000
14000
Fig. 5. R-D performance comparisons between the proposed in-loop filter and H.264 deblocking filter for Stefan (top) and Crew.
6. REFERENCES [1] ITU-T Recommendation and International Standard of Joint Video Specification (ITU-T Rec H.264/ISO/IEC 14496-10 AVC), March 2005. [2] O. Divorra Escoda, P. Yin, and C. Gomila, “A multi-lattice direction-adaptive deartifacting filter for image & video coding,” in Picture Coding Symposium, Lisbon, Portugal, Nov 2007. [3] P. List, A. Joch, J. Lainema, G. Bjøntegaard, and M. Karczewicz, “Adaptive deblocking filter,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 614– 619, July 2003. [4] O. G. Guleryuz, “A nonlinear loop filter for quantization noise removal in hybrid video compression,” in ICIP. IEEE, September 2005, vol. 2. [5] O. G. Guleryuz, “Weighted overcomplete denoising,” in Proc. Asilomar Conference on Signals and Systems, Pacific Grove, CA, Nov 2003.
Fig. 6. Visual comparison of detail crops for Stefan QP=27. Results from H.264/AVC deblocking (left) and proposed in-loop filter.
[6] G. Bjontegaard, “Calculation of average PSNR differences between RD-curves,” Tech. Rep., VCEG-M33, ITU-T Q.6/SG16, Apr 2001. [7] ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JM reference software, version 11.0-KTA, version 1.4.