Efficient Deblocking With Coefficient Regularization ... - IEEE Xplore

2 downloads 0 Views 5MB Size Report
three parts: local ac coefficient regularization (ACR) of shifted blocks in the discrete cosine ... shape adaptive filtering (BSAF) in the spatial domain, and quan-.
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 5, AUGUST 2008

735

Efficient Deblocking With Coefficient Regularization, Shape-Adaptive Filtering, and Quantization Constraint Guangtao Zhai, Wenjun Zhang, Xiaokang Yang, Senior Member, IEEE, Weisi Lin, Senior Member, IEEE, and Yi Xu

Abstract—We propose an effective deblocking scheme with extremely low computational complexity. The algorithm involves three parts: local ac coefficient regularization (ACR) of shifted blocks in the discrete cosine transform (DCT) domain, block-wise shape adaptive filtering (BSAF) in the spatial domain, and quantization constraint (QC) in the DCT domain. The DCT domain ACR suppresses the grid noise (blockiness) in monotone areas. The spatial-domain BSAF alleviates the staircase noise along the edge, and the ringing near the edge and the corner outliers. The narrow quantization constraint set is imposed to prevent possible oversmoothing and improve PSNR performance. Extensive simulation results and comparative studies are provided to justify the effectiveness and efficiency of the proposed deblocking algorithm.

I. INTRODUCTION

B

LOCKINESS is the most prevailing artifact in block discrete cosine transform (DCT) code image [1] and video [2]–[6] under low bit-rate conditions, due to the independent transformation and quantization of image blocks. In order to ameliorate the perceptual picture quality, numerous blockiness reduction algorithms have been proposed during the last two decades. Analytically, the blockiness artifacts can be divided into two categories [7]–[9]: the grid noise in monotone areas and the edge related artifacts, such as staircase noise along edges, ringing around strong edges and corner outliers. The grid noise in monotone area, which is often referred to as “blockiness” in general sense, is a kind of structural artifact. It attracts much human attention and is thus the most annoying artifact [10]. And that is the reason why most deblocking algorithms deal with Manuscript received January 11, 2007; revised January 13, 2008. First published June 13, 2008; last published July 9, 2008 (projected). This work was supported by the National Natural Science Foundation of China (60332030, 60502034, 60625103), Shanghai Rising-Star Program (05QMX1435), the Hi-Tech Research and Development Program of China 863 (2006AA01Z124), NCET-06-0409, and the 111 Project. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Horace Ho-Shing Ip. G. Zhai is with the Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, China and also with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore (e-mail: [email protected]; [email protected]). W. Zhang, X. Yang, and Y. Xu are with the Institute of Image Communication and Information Processing Shanghai Jiao Tong University, Shanghai, 200240, China (e-mail: [email protected]; [email protected]; [email protected]). W. Lin is with the School of Computer Engineering, Nanyang Technological University, 639798 Singapore (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMM.2008.922849

Fig. 1. Example of the half-block-size shifted image block.

grid noise only. However, since edges are fundamental cognitive clues in image [11], [12], for a superior perceptual quality, the edge-related artifacts should also be appropriately addressed [7]–[9], [13]–[19]. The grid noise can be suppressed in spatial domain [7]–[9], [13], [18], [20], transform domain (e.g. DCT [21]–[24], DWT [25], [26], OWE [15], [16], [19], [27], [28]), or the combinations [14], [29], [30]. It is often desirable to perform the deblocking in DCT domain, due to two major reasons: 1) low computation: the quantized DCT coefficients matrix is often quite sparse and thus the computation can be greatly saved and 2) direct manipulation: if the algorithm works on the DCT coefficients, the blockiness can be alleviated before the image is fully decoded. In first part of this paper, we propose a DCT-domain grid noise reduction algorithm using the idea of processing on half-block-sized shifted block, which was firstly proposed by Zeng [21], and is shown in Fig. 1. Supposing the and are two adjacent image blocks, taking the horizontal layout as an example, the right four columns of and left four columns of form a new image block . Should there be any blockiness between and , it would be captured in the middle of the half-block-size shifted image block . Noting that the blockiness is a kind of local discontinuity, Zeng [21] modeled the blockiness as a 2-D step edge, and he observed that this step edge is controlled by four nonzero ac coefficients in the first row in DCT matrix of , and then applied a zero-masking to the higher three ac coefficients so as to suppress the blockiness. Luo and Ward’s [22] algorithm also followed the similar idea. However, they proposed to use adaptively weighted results from image blocks and to replace the four ac coefficients in accordingly to prevent oversmoothing. Liu and Bovic [23] later more explicitly modeled the shifted block as the summation of three parts: the mean image, a step edge signal, and the residual noise, and they showed how to estimate the parameters used in each part in the summation. The blockiness is suppressed by changing the step edge into a slop edge through modifying the associated four ac coefficients. Noting that there

1520-9210/$25.00 © 2008 IEEE

736

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 5, AUGUST 2008

are some new artificial discontinuities aroused in the middle of image blocks, Liu and Bovic adopted a DCT domain postfiltering method proposed by Chen et al. [24] to enhance the result. Based on these works, we derive a more precise coefficients modification scheme, which regulates fewer coefficients and exempts the new artificial discontinuity encountered by previous algorithms [21]–[23]. The DCT domain postfiltering scheme is applied on flat image blocks to guarantee a smooth appearance of the processed image. Since only several ac DCT coefficients are modified, the computation is quite efficient. The edge-related artifacts are generally processed with some spatial-adaptive filtration [7]–[9], [13]–[19]. The idea of removing staircase noise is to apply a 1-D FIR filter on the direction parallel to the edges [7], [9], [13], but this directional filtering is often complicated in implementation. The ringing artifact is characterized as oscillation around strong edges. The most effective way to eliminate ringing is to expel the spurious edge structure around main edges [19]. The algorithm in [7] uses a pixel-fitting procedure and the algorithm in [17] uses a signal adaptive filtering scheme to reduce ringing. The corner outliers are pixels of exceptionally high/low intensity in comparison with its neighborhood at the corners of an image block, and they can be removed through adaptive local averaging [7], [9], [17]. The edge-related artifacts can also be attacked in the DCT domain; the above-mentioned DCT domain filtering method proposed by Chen et al. [24] is based on postfiltering of the DCT coefficients in shifted blocks. The image blocks are classified into the high and low activity blocks according to the dc normalized ac energy computed from the DCT coefficients weighted by the human visual system (HVS) model [31]. The postfiltering is applied to each image block 5 shift ranges for high and low activity with 3 3 and 5 blocks, respectively. The result of the algorithm is quite encouraging, however, since many block DCT/IDCT operations are involved, the complexity of the algorithm is high. Although some fast alternative schemes have been introduced [32], [33], the heavy computational load remains the bottleneck of this DCT domain filtering scheme [23], [24]. We will show that this block shift filtering scheme can be also performed equivalently in spatial domain, and thus saving much of the computation. Furthermore, we derive a similarity measure between the image block and its shifted counterpart, and use only the shifted blocks that resemble the centered block in the low-pass filtering, so as to form a spatial-domain block-wise shape adaptive filtering (BSAF) scheme. This BSAF is used for nonflat image blocks to suppress the edge-related artifacts. The rest of this paper is organized as follows: Sections II and III introduce the DCT domain and spatial-domain filtering, respectively. The simulation results and the comparative studies with existing relevant methods are provided in Section IV. Finally, Section V summarizes and concludes the paper.

adjacent image blocks and are modeled as their mean plus a variational image noised by artifacts (1) is the mean image and is the variational where image noised by artifacts. The shifted image block c can thus be modeled as (2) denotes the 2-D step function, with being the where disparity between two sides of the step, taking the horizontal layout as an example (3) The DCT of the step edge is

(4) Since all the eight rows are same in , according to Luo and Ward’s analysis [22], the term with only can be extracted, and (4) becomes

(5) It is easy to prove that and so we can only find the nonzero elements on the first row of

(6)

Using the property of , when

II. GRID NOISE SUPPRESSION IN DCT DOMAIN

. Now (5) becomes

we have

, and

when

A. Formulation of the Problem We use A, B, and C to denote the DCT matrix of the abovementioned 8 8 image blocks , and shown in Fig. 1. The

(7)

ZHAI et al.: EFFICIENT DEBLOCKING WITH COEFFICIENT REGULARIZATION

Fig. 2. 1-D DCT basis functions of length 8, k = 0,1,2,3. Second row: k = 4,5,6,7.

W (k )

,

k

2

737

[0; 7]. First row:

It is easy to prove that when ,2,4,6, the four terms in (7) will cancel out, so the four nonzero DCT coefficients will be , ,3,5,7. The above result can also be concluded straightforwardly from the properties of the DCT basis functions. We use a 1-D step function as an example, and the 1-D DCT basis functions is shown in Fig. 2. Obviously of length 8, , 0, 2, 4, 6 are symmetric while , 1, 3, 5, . It is easy 7 are anti-symmetric with the respect to to imagine that the symmetric basis functions contribute little in the formation of the anti-symmetric step function, so DCT coefficients corresponding to the step function are all zero. B. Analysis of the Existing Algorithms Zeng [21] proposed to directly apply a zero-mask to nullify , 3, 5, 7 so as to suppress the three ac coefficients the blockiness. He conjectured that the coefficient is of relatively low frequency and thus should not be dropped; otherwise the algorithm would cause over-smoothed images and thus more visual distortion. The process is (8) We now give another explanation of this phenomenon from the perspective of the DCT bases. We only analyze the 1-D case for simplicity. To suppress the blockiness, in the 1-D case, equals to change the step function into a slope or ramp function. Recall the 1-D DCT basis in Fig. 2, the 1-D step function is a , 1, 3, 5, 7. It is imporcombination of the four bases has a period of 16, tant to notice that since the sinusoidal itself is a good approximation to a slope function. So if , 3, 5, 7 are dropped, the reconstructed signal will be , which is much smoother than a step a scaled version of function. This can well explain why Zeng’s algorithm reduces cannot be zeroed out. the blockiness and why the term is not regularized in Zeng’s However, the magnitude of algorithm, and that will cause the reconstructed block to be inconsistent with neighboring blockings and around their in-

Fig. 3. Demonstration of blockiness reduction algorithms with synthesized signal. The x-y plane indicates the image dimensions and the z-axe shows the image intensity. From Left to right, from top to bottom: (a1) Ideal 2-D step edge. (a2) 2-D step edge with noise. (b1) Zeng’s result of (a1). (b2) Zeng’s result of (a2). (c1) Liu and Bovic’s result of (a1). (c2) Liu and Bovic’s result of (a2). (d1) Luo and Ward’s result of (a1). (d2) Luo and Ward’s result of (a2). (e1) Proposed result of (a1). (e2) Proposed result of (a2).

tersections with an impulse at , 1, 8, which can be readily observed in Fig. 3 (b1,b2) and Fig. 4. This discrepancy can seriously affect the visual quality of the processed image. Liu and Bovic [23] proposed to use a 2-D linear slope to replace the step edge in (2) (9) The 2-D slope is defined by the direct linear linkage is also an anti-symof two ends of the step edge. Since

738

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 5, AUGUST 2008

, Clearly, (12) shrinks the four ac coefficients 1, 3, 5, 7 with a scale of [.6 .5 .5 .5]. It can be found form Fig. 3 (d1,d2) and Fig. 4 that this process actually adds two small steps into the original 2-D step, and thus alleviates the abrupt transition. This is the possible rationale behind Luo and Ward’s [22] deblocking algorithm. Apparently the smoothing result of Luo and Ward’s method is not as good as Zeng’s, and Liu and Bovic’s. Furthermore, this algorithm will also cause a local discontinuity on the middle of processed image blocks a and b. However, since the ac coefficients are heavily decreased , in magnitude, there is no impulse as in Zeng’s method at , 1, 8, so the visual result is better than that of Zeng’s. C. The Proposed Ac Coefficient Regularization (ACR) Method

Fig. 4. 1-D plot of the deblocking results on ideal step edge with x- and z- axis taken from (a1), (b1), (c1), (d1), and (e1) of Fig. 3.

metric signal, its DCT only has nonzero coefficients , 1, 3, 5, 7. The DCT of (9) is at

(10) In order to compute from , it is assumed that the , 1, 3, 5, 7 are merely caused four ac coefficients . Although this assumption may hold by the step edge for monotone areas, it is easy to imagine that as the variance increases, its influence on the four ac of residual image coefficients becomes inneglectable. Inaccurate estimation of will also cause the above-mentioned inconsistency with neighboring blocks. Another drawback of this method is that the 2-D , , 1, 8, and linear edge is not differentiable at thus there will be new local discontinuity aroused in the middle of image blocks and (See Fig. 3 (c1,c2) and Fig. 4). Liu and Bovic [23] also noticed this new artificial discontinuity, and proposed to use Chen’s postfiltering method [24] to reprocess the result; this greatly distresses the algorithm because Chen’s method is quite computationally intensive. Luo and Ward [22] proposed to weight the dc and four ac co, , 0, 1, 3, 5, 7 with the image blocks efficients and , when and are smooth and of similar direction

(11) It is noticed that if the image blocks and are smooth enough, , , , the magnitudes of their ac coefficients 1, 3, 5, 7 are almost negligible, and (11) can be rewritten as

With the formulation of the problem and analysis on the drawbacks of the existing methods, we now propose a simpler yet more effective deblocking scheme based on the idea of filtering the half-block-size shifted block. In order to make the algorithm works purely in DCT domain, the blocks , , and can be expressed as (13) (14) (15) ,

with , and

is the transpose of

, ,

is zero matrix,

is the identity matrix. Apply DCT on (13), we have (16)

and are DCT of and , respectively. With the where model defined in Section III, the problem is now how to estimate the parameters of (2) from DCT coefficients. The mean of the image block can be easily computed from C as (17) , 1, Rather than assuming that the four ac coefficients 3, 5, 7 are merely caused by the step function and estimate from as in [23], the disparity of the step function is computed directly from the dc components of image blocks and (18) Similar to Zeng’s method [21], we use a 2-D sinusoidal with the period of 16, which as we analyzed has only , to replace the step one nonzero DCT coefficient function. For the pixel on the upper-right corner of the block, with the equation of IDCT (19) can be computed as

(12)

(20)

ZHAI et al.: EFFICIENT DEBLOCKING WITH COEFFICIENT REGULARIZATION

739

The weights of the mask are defined as smooth blocks and

,

or

for edge blocks in [24], and only the latter was used in [23]. The deblocked image block is computed as (25) and the shifted DCT block is computed as (26) By substituting (25) and (26) into (24) and using the linear property of DCT, we have

Fig. 5. Example of the shifted image block.

Now the ac coefficient regularization (ACR) can be performed as (27)

(21) The processing results from (21) can be illustrated in Fig. 3 , (e1, e2) and Fig. 4. Since the derivative is continuous at , 1, 8, the result is smoother than those of the other algorithms, and thus can effectively prevent the new discontinuity aroused in the middle of the image blocks and . The filtered shifted image block is used to update the neighboring image blocks with DCT of (14) and (15) (22) (23)

III. EDGE-RELATED ARTIFACTS REDUCTION THROUGH BLOCK-WISE SHAPE ADAPTIVE FILTERING A. Formulation of the Problem The idea of DCT domain postfiltering in shifted image blocks is shown in Fig. 5. The centered unshifted image block is deand in the spatial and noted as DCT domain, respectively, the process of postfiltering can be expressed as

(24)

Since the DCT/IDCT process is cancelled out, the postfiltering in shifted image blocks can also be performed accordingly in spatial domain. B. The Block-Wise Shape Adaptive Filtering (BSAF) Method Though we have proved that the postfiltering in shifted image blocks can be computed in spatial domain, and thus much DCT/ IDCT computation can be saved, there is still one problem with the filtering scheme in (27): the shape of the shifting range is not mask flexible. Chen et al. [24] argued that the 5 5 is used for smooth image blocks to smooth out the blockiness, and for non-smooth blocks, the span of the mask is reduced to to prevent blurring of edges. However, the masks 3 3 all take an isotropic square shape, and this makes them inappropriate for image blocks containing edge structures, because the smooth on the perpendicular direction to the edge will inevitably cause blurring of the image. We propose to define a similarity measure between the central and shifted block, and only the shifted blocks that sufficiently resemble the central one are used in the filtering. The half searching range is expanded to . This filtering method is called the half the block size block-wise shape adaptive filtering (BSAF). It is easy to imagine that the BSAF will automatically choose similar neighboring blocks during the filtration. For the edge blocks, it is most likely that the similar blocks are chosen along the direction parallel rather than perpendicular to the local edge, so the edge structure can be better preserved. On the other hand, this filtration parallel to the local edge structure is actually an extension of spatial variant filtering method firstly proposed by Ramamurthi and Gersho [13] to suppress the staircase noise, and adopted later by many deblocking methods to reduce the edge-related artifacts [7]–[9], [13]. However, the BSAF method exempts the edge estimation/detection stage employed by [7]–[9], [13], and

740

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 5, AUGUST 2008

this can further save much computation. The implementation of the BSAF is also simpler than the complicated adaptive filters [7]–[9]. The problem now becomes how to define a justifiable similarity measure between the image blocks with a balance between efficiency and complexity. The mean-squared error (MSE) is currently widely adopted as the quantified similarity measure in image/video processing field, but it involves many multiplication operations. Towards a high efficiency, we employ the mean absolute error (MAE) as the similarity measure. Let us begin the analysis with the simplest condition, only the dc coefficient is left after the quantization for each image block: a dc image , with the block index, is the central image block , block and its three neighboring dc blocks are and , which is shown in Fig. 5. The max-shifted verwill overlap with , and sion for an area of 4 4 pixels each. Suppose that the four dc blocks are of similar grayscale and happen to be quantized into three consecutive quantization intervals as shown in Fig. 5. Since the four neighboring blocks are of adjacent intensities, the discontinuities between them should be smoothed out accordingly. De, and consider the note the quantization interval for dc as property of 2-D DCT, the minimum intensity disparity between , and the MAE between the neighboring blocks should be and reaches the maximum among all the shifted blocks of , and the MAE can be computed as

(28) The set of eligible shifting range for

is defined as Fig. 6. Diagram of the proposed deblocking algorithm.

(29) where is an adjusting coefficient to compensate for the mismatch caused by the possible ac coefficients, and we find is a reasonable choice for most images. The BSAF process can now be written as (30)

where

denotes the number of elements in the set.

C. The Deblocking Algorithm As we discussed in the above two sections, the ACR stage is designed for removing the most annoying grid noise in monotone areas, and the BSAF and QC process is applied on the edge and textural regions to further improve visual quality. The proposed deblocking algorithm is illustrated by a diagram in Fig. 6, which can be summarized into generally four steps. Step 1: Analysis on the DCT matrix of each image block, to identify the dc blocks (block with only a dc component in

its DCT matrix), and non-dc blocks (block with ac component exists in its DCT matrix). Step 2: Application of the ACR method described in Section II on the half-block-size shifted dc blocks to smooth out the grid noise followed by the IDCT operation. If the visual enhancement is acceptable, or under the circumstance that the computational resource is highly restrained, the program ends; otherwise the algorithm goes to step 3). Step 3: Computing the IDCT for non-dc blocks, and then applying the BSAF method to suppress the edge-related artifacts. Step 4: Imposing the quantization constraint set on DCT of the outcome of BSAF to prevent possible oversmoothing, computing IDCT on the processing result, and then integrating the ACR result and the BSAF/QC result to generate the enhanced image. The quantization constraint set [14], [19], [23], [28]–[30], as [34] is applied in DCT domain. Take the image block an example

(31)

ZHAI et al.: EFFICIENT DEBLOCKING WITH COEFFICIENT REGULARIZATION

741

Fig. 8. Processing result using example of “Barbara”. From left to right, from top to bottom: (a) Original “Barbara”. (b) JPEG compressed at 0.34 bpp, : . (c) Processing result of ACR, . (d) : Processing result of , : .

PSNR = 25 69 dB PSNR = 25 70 dB ACR + BSAF + QC PSNR = 26 20 dB

Fig. 7. Processing result using example of “Lena”. From left to right, from top to bottom: (a) Original “Lena”, 512 3 512. (b) JPEG coded at 0.24 bits per pixel, : . (c)The dc image blocks are emphasized by white frames. : . (e) Number of shifted blocks (d) Processing result ACR, indicated by intensity of blocks. (f) Processing result of , : .

PSNR = 30 40 dB

PSNR = 31 26 dB

PSNR = 30 42 dB

ACR+BSAF+QC

with

(32)

is the quantization table defined in the encoder where to form a narrow and available in the decoder. We set quantization constraint set for improved performance [35]. We now give some more discussion on the deblocking algorithm. If the DCT/IDCT operations indicated by the dash boxes are omitted, the BSAF/QC stage resembles the method in [24], and the whole deblocking algorithm becomes similar to that of [23] and works in pure DCT domain. Though the dash-boxed DCT/IDCT operations on ac blocks look like computational overhead, the overall complexity of the algorithm is actually largely reduced because the BSAF lets off the repeated IDCT/DCT on shifted image blocks, as compared to [23] and [24]. It is also observed from Fig. 6 that the proposed algorithm can be integrated as part of the JPEG decoder to replace the

Fig. 9. Processing result using example of “Baboon”. From left to right, from top to bottom: (a) Original “Baboon”. (b) JPEG compressed at 0.46 bpp, : . (c) Processing result of ACR, : . (d) Processing result of , : .

PSNR = 23 43 dB PSNR = 23 41 dB ACR + BSAF + QC PSNR = 23 66 dB

conventional IDCT operation after the entropy/run-length decoding. IV. SIMULATION RESULTS AND COMPARATIVE STUDY As mentioned in Section II and III, our deblocking algorithm takes a hierarchical structure: the ACR stage firstly suppresses

742

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 5, AUGUST 2008

Fig. 10. Processing result using example of “Peppers”. From left to right, from top to bottom: (a) Original “Peppers”. (b) JPEG compressed at 0.25 bpp, : . (c) Processing result of ACR, . (d) : , : . Processing result of

PSNR = 30 18 dB PSNR = 30 20 dB ACR + BSAF + QC PSNR = 30 91 dB

the most annoying grid noise, and effectively enhance the perceptual quality of the picture. The BSAF stage further reduces the edge-related artifacts, and the quantization constraining (QC) stage enhances the result of BSAF and prevents blurring. Using the example of “Lena”, the processing results of ACR and are illustrated in Fig. 7, as can be found that both the perceptual quality and PSNR performance keep improving as the stages proceed. The detected dc image block and the number of eligible shifted image blocks used in (30) are also illustrated in Fig. 7. It is clear that the shifting block number is more for smooth blocks and less for edge blocks, and this manifests the shape-adaptive property of the proposed BSAF. Besides “Lena,” the other three images involved in the simulation are “Barbara,” “Baboon,” and “Peppers,” and all these standard images have been often used in the tests of blockiness reduction. “Lena” and “Barbara” contain both large flat and textured areas. “Baboon” has lots of highly transitional hairy regions while “Peppers” consists of large smooth regions. The processing results with the ACR, with the test images are shown in Figs. 8–10. Obviously, the ACR method successfully suppressed most of the disturbing grid noise in monotone areas such as the shoulder of and background of “Lena,” the background of “Barbara” and the surfaces of “Peppers,” and the visual quality of the pictures are largely improved. The PSNR difference ranges from 0.01 to 0.17 dB. It is also noticed that the relatively higher PSNR gain of ACR is accompanied with images with large smooth regions, such as “Lena” and “Peppers,” and this further confirms the performance of the proposed ACR method on removing grid noise in monotone areas. The BSAF + QC stage enhances the visual quality of the edge features in images, such as the shoulder and hat rims of “Lena,” the table leg and arm sides

Fig. 11. Visual comparison of deblocking algorithms.

of “Barbara,” and the contours of “Peppers.” It is clear that the BSAF/QC stage effectively alleviates the staircase noise, ringing and corner outliers, and renders sharp and clear edge appearance. Thanks to the narrow quantization constraint set, the PSNR gain rises to 0.23 to 0.89 dB, and this implies a strong fidelity enhancement ability of the proposed BSAF/QC method. It is also noticed that despite of the PSNR improvement, the further visual quality enhancement of image with large textural areas, e.g. “Baboon” is only marginal. This suggests that for images with many highly transitional areas, e.g. textures, the BSAF and QC stage can be skipped for better efficiency. We provide an extended visual comparison of the proposed (method 2) with other ARC (method 1), state-of-the-art deblocking algorithms, illustrated in Fig. 11 with a zoomed shoulder part of “Lena.” It can be found that the ACR method smoothes out most of the grid noise, and the BSAF/QC stage suppresses the edge-related artifacts, and the resultant image is of superior perceptual quality. As we analyzed in Section II, Zeng’s [21] algorithm causes some new local discontinuities, and thus the blockiness cannot be fully suppressed. Luo and Ward’s [22] method brings some step edge into the monotone image blocks, and this can be readily observed, and the edge-related artifacts have not been dealt with in [21] and [22]. Liu and Bovic’s [23] scheme provides some fair processing results on both the edge and smooth areas, at the expense of complicated DCT domain block-shifted

ZHAI et al.: EFFICIENT DEBLOCKING WITH COEFFICIENT REGULARIZATION

743

TABLE I PSNR COMPARISON OF DEBLOCKING ALGORITHMS. (B.R.: BIT RATE (IN bpp), PROPOSED 1: ACR, PROPOSED 2:

filtering. Chen et al.’s [24] can suppress most of the severe grid noise and some of the edge-related artifacts, but due to the constraint by the length of the mask, the grid noise cannot be comprehensively smoothed out, and the edge features tend to be blurred because the filter is not shape adaptive. The deblocking algorithms in [7]–[9], [17] aim at removing both the grid noise and some of the edge-related artifacts, however, limited by the spans of the low-pass filter, the grid noise cannot be thoroughly suppressed. The wavelet-based methods [19], [25], [27], [28] generally show good performance in grid noise reduction, but not enough effort have been made toward the edge-related noise; in addition, the wavelet transform costs further computational overhead. The classical spatial-invariant filtering [20] tends to blur edges near block boundaries and the spatial-variant filtering [13] also cannot fully remove the grid noise. The iterative POCS method [29], [34] and MAP method [30] provide some median quality results with the cost of huge computations. We give the PSNR performance comparison in Table I for the test images. The bit rate of the tested images covers a range from 0.14 to 0.76 bits per pixel, with the corresponding compression ratio ranging from 10 to 60. The highest three PSNR result for each column is emphasized in bold, with the order indicated by the shades. It is observed that the proposed method and the Huber–Markov random field (HMRF)-based MAP algorithm [30] attain the best overall

ACR + BSAF + QC)

PSNR performance. However, despite of the high complexity of the iterative HMRF-MAP method, both the grid noise and the edge related artifacts are not sufficiently suppressed in HMRF method [30]. The DWT method [25], OWE method [19], DCT domain filtering method [24], and the spatial-domain filtering method [7] yield the second best PSNR performance. The wavelet transform in [19] and [25], the continuous block DCT/IDCT in [24] and the complicated spatial adaptive filtering in [7] are all computationally expensive. It is also noticed that the proposed ACR method in general has no much gain on PSNR of the processed images (because it has been designed to suppresses grid noise for enhancement of the perceptual image method can effectively quality), while the enhance the PSNR of the processed image. An illustration of processing time of the deblocking algorithms is given in Table II. All the algorithms are implemented with Matlab R2006a, running under Microsoft Windows XP on an IBM T-60 laptop computer with Intel T2400 [email protected] GHz. and 1 G RAM. The processing time is measured for a typical 512 512 “Lena” image JPEG compressed at 0.24 bits per pixel. Although Matlab is with high-level un-compiled environment, the results provided here still give a reasonable comparison. As we analyzed, Chen et al.’s [24] algorithm needs huge computation due to the multiple DCT/IDCT pairs involved for each block. Likewise, Liu and Bovic’s [23] method also needs long processing time because they adopted Chen et al.’s [24]

744

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 5, AUGUST 2008

TABLE II PROCESSING TIME COMPARISON OF THE DEBLOCKING ALGORITHMS (IN SECONDS, FOR 512

algorithm for edge region processing. As expected, the iterative processes in [21], [29], [30], and [34] (all with 10 iterations in the current implementation) need heavy processing. The spatial-domain combined deblocking algorithm in [7] also require a long execution time because the multiple passes with different kinds of artifacts. The spatial variant algorithm [13] spends a lot of time for directional filtering of edges. The wavelet-based algorithms [19], [25], [27], [28] vary much in processing time. Luo and Ward’s [22] simple DCT domain averaging algorithm and several other spatial-domain filtering schemes [8], [9], [17], [20] are found to have the shortest running time, however, as shown in Table I, this is at the expense of compromised quality. The proposed ACR method requires even less computation time than Luo and Ward’s [22] method. The method approximately triples the time of ACR, but is still much less than the algorithms in [7], [13], [19], [21], [23], [24], [29], [30], and [34]. V. CONCLUSION A comprehensive blockiness reduction algorithm been proposed with three processing stages: ACR, BSAF, and QC. The image blocks are firstly classified into the dc and ac blocks. The ACR method works on the half-block-size shifted dc image blocks by modifying the ac coefficients to render a smoothing look of the processed image. The BSAF method works on shifted ac image blocks in the spatial domain to further ameliorate the visual quality of edge features. Finally the narrow quantization constraint set is imposed on the processing result of BSAF to prevent possible oversmoothing. For computational power-restrained applications, the BSAF/QC stage can be passed by toward higher efficiency without compromising the algorithm’s performance in suppression of the most disturbing grid noise. Comparative study shows that the proposed scheme visually outperforms many state-of-the-art and classic deblocking algorithms, with relatively low computational complexity. REFERENCES [1] W. B. Pennebaker and J. L. Mitchel, JPEG Still Image Data Compression Standard. New York: Van Nostrand, 1993. [2] J. L. Mitchel and W. B. Pennebaker, MPEG Video Compression Standard. New York: Chapman&Hall, 1997. [3] “Video codec for audiovisual service at px64 kbits/s,” ITU-T Recommendation H. 261 Version 2 1993. [4] G. Cote, B. Erol, M. Gallant, and F. Kossentini, “H.263+: Video coding at low bit rates,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 7, pp. 849–866, Nov. 1998. [5] H. Kalva, “The H.264 video coding standard,” IEEE Multimedia, vol. 13, no. 4, pp. 86–90, 2006. [6] “Special issue on scalable video coding-standardization and beyond,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 8, p. 1034, Aug. 2006.

2 512 “LENA” IMAGE JPEG COMPRESSED AT 0.24 bit per pixel)

[7] A. S. Al-Fohoum and A. M. Reza, “Combined edge crispiness and statistical differencing for deblocking JPEG compressed images,” IEEE Trans. Image Process., vol. 10, no. 9, pp. 1288–1298, Sep. 2001. [8] C. J. Kuo and R. J. Hsieh, “Adaptive postprocessor for block encoded images,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 4, pp. 298–304, Aug. 1995. [9] Y. L. Lee, H. C. Kim, and H. W. Park, “Blocking effect reduction of JPEG images by signal adaptive filtering,” IEEE Trans. Image Process., vol. 7, no. 2, pp. 229–234, Feb. 1998. [10] M. Yuen, “Coding artifacts and visual distortions,” in Digital Video Image Quality and Perceptual Coding, H. R. Wu and K. R. Rao, Eds. Boca Raton, FL: CRC, 2006, pp. 87–122. [11] E. D. Montag and M. D. Fairchild, “Fundamentals of human vision and vision modeling,” in Digital Video Image Quality and Perceptual Coding, H. R. Wu and K. R. Rao, Eds. Boca Raton, FL: CRC Press, 2006. [12] B. Wandell, Foundations of Vision. Sunderland, MA: Sinauer Associates, 1995. [13] B. Ramamurthi and A. Gersho, “Nonlinear space-variant postprocessing of block coded images,” IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-34, no. 5, pp. 1258–1268, Oct. 1986. [14] Y. Y. Yang and N. P. Galatsanos, “Removal of compression artifacts using projections onto convex sets and line process modeling,” IEEE Trans. Image Process., vol. 6, no. 10, pp. 1345–1357, Oct. 1997. [15] T. C. Hsung, D. P.-K. Lun, and W. C. Siu, “Deblocking technique for block-transform compressed image using wavelet transform modulus maxima,” IEEE Trans. Image Process., vol. 7, no. 10, pp. 1488–1496, Oct. 1998. [16] T. C. Hsung and D. P.-K. Lun, “Application of singularity detection for the deblocking of JPEG decoded images,” IEEE Trans. Circuits Syst. II, vol. 45, no. 5, pp. 640–644, May 1998. [17] H. W. Park and Y. L. Lee, “Postprocessing method for reducing quantization effects in low bit-rate moving picture coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 1, pp. 161–171, Feb. 1999. [18] J. G. Apostolopoulos and N. S. Jayant, “Postprocessing for very low bit-rate video compression,” IEEE Trans. Image Process., vol. 8, no. 8, pp. 1125–1129, Aug. 1999. [19] A. W. C. Liew and Y. Hong, “Blocking artifacts suppression in block-coded images using overcomplete wavelet representation,” IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 4, pp. 450–461, Apr. 2004. [20] H. C. I. Reeves and J. S. Lim, “Reduction blocking effects in image coding,” Opt. Eng., vol. 23, no. 1, pp. 034–037, 1984. [21] B. Zeng, “Reduction of blocking effect in DCT-coded images using zero-masking techniques,” Signal Process., vol. 79, no. 2, pp. 205–211, 1999. [22] L. Ying and R. K. Ward, “Removing the blocking artifacts of blockbased DCT compressed images,” IEEE Trans. Image Process., vol. 12, no. 7, pp. 838–842, Jul. 2003. [23] L. Shizhong and A. C. Bovik, “Efficient DCT-domain blind measurement and reduction of blocking artifacts,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 12, pp. 1139–1149, Dec. 2002. [24] T. Chen, H. R. Wu, and B. Qiu, “Adaptive postfiltering of transform coefficients for the reduction of blocking artifacts,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 5, pp. 594–602, May 2001. [25] S. Wu, H. Yan, and Z. Tan, “An efficient wavelet-based deblocking algorithm for highly compressed images,” IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 11, pp. 1193–1198, Nov. 2001. [26] H. Choi and T. Kim, “Blocking-artifact reduction in block-coded images using wavelet-based subband decomposition,” IEEE Trans. Circuits Systems Video Technol., vol. 10, no. 5, pp. 801–805, Aug. 2000. [27] N. C. Kim, I. H. Jang, D. H. Kim, and W. H. Hong, “Reduction of blocking artifact in block-coded images using wavelet transform,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 3, pp. 253–257, Jun. 1998.

ZHAI et al.: EFFICIENT DEBLOCKING WITH COEFFICIENT REGULARIZATION

[28] Z. Xiong, M. T. Orchard, and Y. Q. Zhang, “Deblocking algorithm for JPEG compressed images using overcomplete wavelet representations,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 2, pp. 433–437, Apr. 1997. [29] A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image coding,” IEEE Trans. Circuits Systems Video Technol., vol. 2, no. 1, pp. 91–95, Feb. 1992. [30] T. P. O’Rourke and R. L. Stevenson, “Improved image decompression for reduced transform coding artifacts,” IEEE Trans. Circuits Systems Video Technol., vol. 5, no. 6, pp. 490–499, Dec. 1995. [31] K. N. Ngan, K. S. Leong, and H. Singh, “Adaptive cosine transform coding of images in perceptual domain,” IEEE Trans. Acoust. Speech Signal Process., vol. 37, no. 11, pp. 1743–1750, Nov. 1989. [32] S. F. Chang and D. G. Messerchmitt, “Manipulation and compositing of MC-DCT compressed video,” IEEE J. Select. Areas Commun., vol. 13, no. 1, pp. 1–11, Jan. 1995. [33] N. Merhav and V. Ahaskaran, “Fast algorithms for DCT-domain image. down-sampling and for inverse motion compensation,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 3, pp. 468–676, Jun. 1997. [34] Y. Y. Yang, P. Galatsanos, and A. K. Katsaggelos, “Regularized reconstruction to reduce blocking artifacts of block discrete cosine transform compressed images,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, no. 6, pp. 421–432, Jun. 1993. [35] S. H. Park and D. S. Kim, “Theory of projection onto the narrow quantization constraint set and its application,” IEEE Trans. Image Process., vol. 8, no. 10, pp. 1361–1373, Oct. 1999.

Guangtao Zhai received the B.E. and M.E. degrees in electronics engineering from Shan Dong University, Jinan, China, in 2001 and 2004, respectively. He is currently pursuing the Ph.D., degree at the Institute of Image Communication and Information Processing, Shanghai Jiao Tong University, Shanghai, China.

745

Wenjun Zhang received the B.S., M.S., and Ph.D. degrees in electronic engineering from Shanghai Jiao Tong University, Shanghai, China, in 1984, 1987, and 1989, respectively. As a group leader, he was successfully in charge of developing the first Chinese HDTV prototype system in 1998. He is a Changjiang Scholarship Professor in the field of communications and electronic systems with Shanghai Jiao Tong University. His research interests include digital media processing and transmission, video coding, and wireless wideband communication systems.

Xiaokang Yang (M’00–SM’04) received the B.Sci. degree from Xiamen University, Xiamen, China, in 1994, the M.Eng. degree from the Chinese Academy of Sciences, Beijing, China, in 1997, and the Ph.D. degree from Shanghai Jiaotong University, Shanghai, China, in 2000. He is currently a Professor with the Institute of Image Communication and Information Processing, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China. From April 2002 to October 2004, he was a Research Scientist in the Institute for Infocomm Research, Singapore. His current research interests include scalable video coding, video transmission over networks, video quality assessment, digital television, and pattern recognition.

Weisi Lin (M’92–SM’98) received the B.Sc. and M.Sc. degrees from Zhongshan University, Guangzhou, China, in 1982 and 1985, respectively, and the Ph.D. degree from King’s College, London University, London, U.K., in 1992. He has taught and/or conducted researched at Zhongshan University, Shantou University (China), Bath University (U.K.), and the National University of Singapore and Institute of Microelectronics (Singapore). He has been the Project Leader of a number of successfully delivered projects in development of digital multimedia related technologies since 1997. He is currently an Associate Professor at the School of Computer Engineering, Nanyang Technological University, Singapore. His research interests include perceptual visual distortion metrics, perceptual video coding, and multimedia signal processing.

Yi Xu received the B.Sc. and M.Sc. degrees from Nanjing University of Science and Technology, Nanjing, China, in 1996 and 1998, respectively, and the Ph.D. degree from Shanghai Jiao Tong University, Shanghai, China, in 2004. She is currently an Assistant Professor in the Institute of Image Communication and Information Processing, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China. Her research interests include wavelet theory and applications.

Suggest Documents