Application Of Singularity Detection For The Deblocking ... - IEEE Xplore

2 downloads 0 Views 183KB Size Report
Abstract—Blocking effect is considered as the most disturbing artifact of. JPEG decoded images. Many researchers have suggested various methods to tackle ...
640

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1998

Application of Singularity Detection for the Deblocking of JPEG Decoded Images Tai-Chiu Hsung and Daniel Pak-Kong Lun

Abstract—Blocking effect is considered as the most disturbing artifact of JPEG decoded images. Many researchers have suggested various methods to tackle this problem. Recently, the wavelet transform modulus maxima (WTMM) approach was proposed and gives a significant improvement over the previous methods in terms of signal-to-noise ratio (SNR) and visual quality. However, the WTMM deblocking algorithm is an iterative algorithm that requires a long computation time to reconstruct the processed WTMM to obtain the deblocked image. In this brief, a new wavelet-based algorithm for JPEG image deblocking is proposed. The new algorithm is based on the idea that besides using the WTMM, the singularity of an image can also be detected by computing the sums of the wavelet coefficients inside the so-called “directional cone of influence” in different scales of the image. The new algorithm has the advantage over the WTMM approach in that it can effectively identify the edge and smooth regions of an image irrespective of the discontinuities introduced by the blocking effect. It improves over the WTMM approach in that only a simple inverse wavelet transform is required to reconstruct the processed wavelet coefficients to obtain the deblocked image. As with the WTMM approach, the new algorithm gives consistent and significant improvement over the previous methods for JPEG image deblocking. Index Terms—Image enhancement, wavelet transformation.

I. INTRODUCTION For most of the current block-based image-coding techniques, blocking effect often results due to the quantization errors introduced in the encoding process of different blocks of the transformed images. Since these coding techniques are generally applied in the prevalent JPEG and MPEG standards, it has been a general interest to study the method to alleviate the blocking-effect problem. Traditional approaches such as the low-pass filtering method [1], constrained optimization methods [2], [3], projection onto convex set technique [4], and neural-network method [5] use different optimization cost functions and constraints, such as to limit the variation of block boundaries, etc., to smooth out the discontinuity incurred. These methods reduce the blocking effect in the sense of improving the signal-to-noise ratio (SNR). However, the visual quality of the processed image sometimes does not appear to be very much improved. One of the reasons for this is that the visual importance is not included in the cost function of these optimization methods for deblocking. Recently, we proposed the wavelet transform modulus maxima (WTMM) approach [6] for JPEG image deblocking. By using the WTMM representation, the blocking effect is characterized as the following three sub-problems: 1) small modulus maxima at block boundaries over smooth regions; 2) noises or irregular structures near strong edges; and 3) corrupted edges across block boundaries. Simple and local operations are then applied to the WTMM of the blocky image to solve the problems one by one. Finally, the deblocked image is reconstructed from the processed WTMM using Manuscript received October 20, 1997; revised February 25, 1998. This work was supported under a PolyU Research Grant 357/021. This paper was recommended by the Guest Editors F. Maloberti and W. C. Siu. The authors are with the Department of Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong. Publisher Item Identifier S 1057-7130(98)03953-6.

an iterative projection onto the convex set (POCS) technique [7]. Although the WTMM approach has many advantages as compared with the traditional approaches, the time-consuming reconstruction process allows the approach to be useful only in some nonreal-time applications. The basic idea of the WTMM approach is to detect the discontinuities of the blocky image, since the blocking effect generally appears as discontinuity. However, we recently found [8] that we can also detect the discontinuity of an image by a much simpler approach, i.e., by simply computing the sums of the wavelet coefficients inside the so-called “directional cone of influence” in different scales of the image [8]. This approach has an obvious advantage over the WTMM approach in that only a simple inverse wavelet transform is required for reconstruction, as compared to the complicated POCS technique used in the WTMM approach. Based on this idea, we propose in this brief a new deblocking algorithm. Using this algorithm, the blocky image is first segmented into the smooth regions and the “nonsmooth” regions (or the texture-like regions) based on the local variation of the Lipschitz exponents [7], which are, in turn, approximated by the corresponding wavelet coefficients of the blocky image. Although the blocking effect is most observable in smooth regions, it can easily be identified by the small wavelet coefficients it introduced in these regions. We can easily remove these small wavelet coefficients by performing a hard thresholding. Besides the discontinuities, the blocking effect will also introduce a ringing artifact near the strong edges of the blocky image. The singularity detection (SD) denoising technique is then applied to remove the ringing artifact, but preserve the edges of the image. Finally, an inverse wavelet transform is performed on the processed wavelet coefficients to reconstruct the deblocked image. In the following sections, we first briefly introduce the SD denoising technique, and then we describe how it is applied to image deblocking.

II. SD DENOISING ALGORITHM The local regularity of certain types of nonisolated singularities in an image can be characterized by their Lipschitz exponents [7]. By evaluating these Lipschitz exponents, we can identify noise from original contents of an image. It has been shown [8] that the Lipschitz exponents of an image can be evaluated by computing the sums of the wavelet coefficients inside the so-called “directional cone of influence” in different scales of the image. Define the discrete dyadic wavelet transform [7] of a discrete image f to be fS2 f; (W21 f )1j J ; (W22 f )1j J g where f 2 L2 (R2 ). The components are obtained by the convolutions of f (x; y ) with the scaling function and the dilated wavelets S2 f = f 3 2 (x; y); W21 f (x; y) = f 3 21 (x; y); W22 f (x; y) = f 3 22 (x; y). The wavelets are designed to be the partial derivatives of a smoothing function along x- and y -directions, respectively. That is, 1 (x; y ) = @ 8(x; y )=@x and 2 (x; y ) = @ 8(x; y )=@y . Denote also the modulus of wavelet transform M2 f (x; y ) = jW21 f (x; y )j2 + jW22 f (x; y )j2 and the phase A2 f (x; y ) =

arctan(W22 f (x; y)=W21 f (x; y)). They indicate the magnitude and

orientation of the gradient vector of the wavelet coefficient at a particular point (x; y ). Since the orientation of the gradient vector of the wavelet coefficients indicates the direction where the maximum local variation of a signal is found, we only need to measure the Lipschitz exponent in that direction in order to identify the singularity of f (x; y ). The Lipschitz exponent at a point (x0 ; y0 ) of a two-

1057–7130/98$10.00  1998 IEEE

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1998

641

dimensional (2-D) function f in a particular direction is related to the modulus of the wavelet coefficients of this function in scales s by the following equation: j

Ms f (x; y )j  Bs

where B is a constant. Instead of directly fitting the Lipschitz exponent from a set of neighboring coefficients in scale space or tracing the maxima curves across scales as indicated in [7], we compute the integral of the wavelet transform modulus inside the socalled directional cone of influence. The directional cone of influence is defined as the support of the wavelet function in different scales and in the direction where the maximum local variation of a signal is found. We define an operator N , which is dubbed as the “directional sum” of the wavelet transform modulus inside the directional cone of influence of a function, such that

Ns f (x0 ; y0 ) =

(

x;y)2D

Ms f (x; y) dx dy

(a)

(b)

(c)

(d)

(1)

Ds = f(x; y): (x 0 x0 )2 + (y 0 y0 )2  Ks2 ; (y 0 y0 )=(x 0 x0 ) = tan(A2 f (x0 ; y0 ))g is the directional cone of influence. We where

implement the line integral of (1) with a linear interpolation since not all wavelet coefficients lies exactly on the direction indicated by As f . From (1), it can be proven that [8]

Ns f (x0 ; y0 )  B s +1 0

(2)

where B 0 is a constant. Therefore, the Lipschitz exponent can be estimated from the upper bound of the slope of log(Ns f ). It is known [7] that the Lipschitz exponent of Gaussian noise usually possesses negative value. In the case that  01

N2

f (xo ; yo )  N2 f (xo ; yo );

for

1

 j 2:

(3)

By selecting the wavelet coefficients that fulfill this “interscale ratio” condition, as stated in (3), we can effectively remove noise while the edges and the regular part of the signal can be preserved. The use of the interscale ratio method provides a simple means to select the wavelet coefficients that correspond to the regular part of the signal. Nevertheless, due to the error generated during the estimation of Ns f as well as the alias that may be introduced from the cone of influence of nearby singularities, it is observed that some small irregular signal will have its wavelet coefficients fulfill the criterion defined in (3). This effect is more obvious for those irregular signals which have 01 < < 0. Therefore, we also need the interscale difference condition to reject the small irregular signal

N2

Fig. 1. Denoising result for a unit disk. (a) Original unit disk. (b) Noisy image (6.82 dB). (c) SD denoised (16.29 dB). (d) Soft threshold denoised (15.53 dB).

f (x0 ; y0 ) 0 N2 f (x0 ; y0 ) >

(4)

where is a threshold. For the case that the value of is not comparable to the amplitude of the signal at any location, i.e., ' 0, the condition as stated in (4) becomes the original interscale ratio condition as stated in (3), since all data which satisfy (3) will satisfy (4). In actual practice, certain tolerance is allowed in using (3) and (4) to select the required wavelet coefficients. It is particularly important

for some classes of signals with known regularity range. To illustrate the effect of SD denoising, the algorithm is applied to a unit disk [see Fig. 1(a)] which is contaminated with zero-mean Gaussian noise with variance  2 = 0:2 [see Fig. 1(b)]. In addition, some shot noises are introduced. We compare the result of SD denoising ( = 0) with that of wavelet shrinkage denoising (threshold = 1:86 ). The result shows that the SD denoising technique removes most of the shot noises and Gaussian noise in the image as expected. The denoised image has a higher SNR as compared to the wavelet shrinkage approach. III. SD

FOR

DEBLOCKING

The blocking effect, which is incurred by the quantization errors of the image blocks, affects the decoded image in several ways. The most significant ones are: 1) it may introduce small discontinuities at block boundaries and 2) if there are edges inside or across several blocks, it will also introduce “ringing” artifacts, as shown in Fig. 2. In the sequel, we propose a new deblocking algorithm based on the SD technique to solve these two problems. Although the discontinuities introduced by the blocking effect occurred only in the block boundaries, the wavelet coefficients that the discontinuities incurred spread over the whole image. They mix with the wavelet coefficients of the original image such that explicit removal of them may accidentally remove the wanted wavelet coefficients and lead to the “over-smooth” situation. Fortunately, the blocking effect is most observable in smooth regions of the image, where there should only be a few wavelet coefficients that have significant values (even for the case that we are using a wavelet function with only one vanishing moment). Consequently, most of the wavelet coefficients (if found) in the smooth regions are contributed by the blocking effect. To remove these wavelet coefficients, we first perform a segmentation to identify the smooth region, or more exactly, to identify the nonsmooth regions of the image. Then we perform a hard thresholding to the wavelet coefficients in the smooth regions to reduce the blocking effect. To identify the nonsmooth regions of the image, we again make use of the local Lipschitz exponents. By nonsmooth regions, we refer to the regions that have low (but some) local variation of regularity among the neighboring coefficients. Two examples are the hair

642

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1998

(a)

(a)

(b)

(b)

Fig. 3. Wavelet coefficients selection map (see text). (a) Level 21 . (b) Level 22 .

(c)

(d)

Fig. 2. Boom up of the results in reducing the ringing artifacts. (a) Original Lena. (b) JPEG coded Lena. (c) CLS deblocked Lena. (d) SD deblocked Lena.

of “Lena” and “Baboon.” Since the Lipschitz exponents show the local regularity of the image, it is a good indicator to identify these regions. Direct evaluation of the Lipschitz exponents may incur a long computation time and, in fact, what we need is only a classifier that tells a particular pixel it belongs to the smooth region or the nonsmooth region. From (3), we know that if the interscale ratio of N2 f (x0 ; y0 ) and N2 f (x0 ; y0 ) at point f (x0 ; y0 ) is greater than two, it implies that the Lipschitz exponent at that point most probably is greater than zero, i.e., it either belongs to an edge or the smooth region. This formulation shows that we can make use of the interscale ratio to approximate the Lipschitz exponents. More specifically, we make use of the local variance of the interscale ratio as follows:

V2 (x; y ) =

1 3

1

(l2 (xu ; yu ) u=01

0 l2 (x; y))2

(5)

where we denote the interscale ratio at pixel (x; y ) at level 2j to be l2 (x; y ), and l2 (xu ; yu ) to be the interscale ratio evaluated at pixel (xu ; yu ) that is adjacent to (x; y ) and in the direction parallel to that of l2 (x; y ). The local mean of l2 (x; y ) is defined to be l2 (x; y ). By evaluating the variance of interscale ratio, we can determine the variation of regularity of adjacent pixels of an image. Those pixels that have low variation will be classified as those belonging to the nonsmooth region. In practice, we use the following segmentation algorithm to generate the so-called “nonsmooth region” map where the parameters V2low ; V2up can be obtained empirically and work for all images: 1) compute V2 (x; y ) of the blocky image for scale 21 to 2J ; 2) classify the irregular coefficient to be inside the “nonsmooth” region if V2low < V2 (x; y ) < V2up . For those areas that are not nonsmooth regions, we perform a hard thresholding, i.e., to remove those wavelet coefficients that have values smaller than a threshold. Fig. 3 shows the result of segmentation for “baboon.” The gray pixels indicate the corresponding locations where the wavelet coefficients are selected after the hard thresholding. The black pixels indicate those small wavelet coefficients that originally should have removed by hard thresholding, but are retained since they belong to the “nonsmooth region” of the image. We can see that most of the wavelet coefficients introduced

due to the blocking effect in smooth regions, such as the noise and face of “baboon,” are removed. Besides the discontinuities that are generated in block boundaries, the ringing artifacts that occur near strong edges are also an annoying distortion introduced by the blocking effect. Human beings are sensitive to edges so that distortion near edges will be very much observable. Traditional techniques which aim at smoothing the irregular ringing artifacts may also smooth out the edges of the image. To better remove these irregular ringing artifacts, we can apply the SD denoising algorithm. By using this algorithm, the ringing artifacts are considered as irregular patterns and the corresponding wavelet coefficients are removed as they do not fulfill the interscale ratio and interscale difference conditions. Since the SD algorithm can easily identify edges from irregular pattern, the edges of the image can be preserved. Fig. 2 shows a boom up of the ringing artifact near hat of “Lena.” It is seen that by using the SD algorithm, the ringing artifact is greatly removed without affecting the edge too much. The proposed deblocking algorithm can be summarized as follows in the SD deblocking algorithm. 1) Compute W21 f; W22 f and N2 f for 1  j  J . 2) For a particular scale 2j , compute N2 f=N2 f (the interscale ratio) and its variance at all points (x; y ). 3) In that scale, generate the nonsmooth region map and perform hard thresholding in the smooth regions, i.e., select W21 f; W22 f if they are greater than a threshold value () in these regions. 4) Reject those W21 f; W22 f at (x0 ; y0 ) if they do not fulfill the interscale ratio and interscale difference conditions, i.e., N2 f =N2 f  2 and N2 f 0 N2 f  ; where is a threshold value. 5) Repeat step 2 for the next scale. 6) Reconstruct image from the selected wavelet coefficients using the inverse wavelet transform. We show the results of applying the proposed SD deblocking algorithm to “Lena” and “baboon.” Fig. 4(a) and (d) shows the JPEG decoded “Lena” and “baboon,” respectively. Fig. 4(b) and (e) show the constraint least square (CLS) deblocked “Lena” and “baboon,” respectively. Fig. 4(c) and (f) shows the SD deblocked “Lena” and “baboon,” respectively. We can see that the proposed deblocking method improves the JPEG decoded image in both visual appearance as well as SNR. For the determination of the thresholds V2low and V2up , we made use of a set of images, including “Lena” and “baboon.” We observed the texture-like regions in these images and obtained a statistic of their interscale ratios. We chose an average value of these interscale ratios and determined the thresholds V2low and V2up . It is interesting to note that these thresholds are image independent. It is because these thresholds are in fact the ratios of WTMS in different scales, which are invariant to the change of the amplitude of image pixels. The only thing that will lead to the change of these ratios is the local activities of the image pixels, which are the thing that we would

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1998

(a)

(d)

(b)

(e)

(c)

(f)

643

Fig. 4. Deblocking result for Lena and baboon. (a) JPEG coded lena. (b) CLS deblocked lena. (c) SD deblocked. (d) JPEG coded baboon. (e) CLS deblocked baboon. (f) SD deblocked.

like to detect. For the thresholds , , we use a similar approach for their determination. Again, these thresholds are image independent as they are only affected by the quantization errors introduced to the smooth regions of the image. In practice, the quantization tables should be known a priori; hence, once we choose the values for these thresholds, they can be used independently to the image transmitted. We choose the parameters as V2low : ; V2up : ; V2low up : 2 ; 2

2 ; 2 . In our : ; V2 simulation, all JPEG images are coded using the quantization table, as given in Table I, with quality factor 1. Table II shows a comparison

0 01

= 0 1;

= 16

= 0 005 = 01 = 8; = 40 = 30

=

of the performance of the SD deblocking with other conventional methods, such as the nonlinear interpolative decoding N ido [9], constraint least square Cls [2], and projection onto convex set method P ocsygk from [2]. We denote SD to be the proposed SD deblocking method. We also denote bpp to be the number of bit per SNR to be the difference of the deblocked image in pixel, and SNR from the original JPEG decoded image. It is seen in Table II that the SD deblocking algorithm gives a consistent improvement in SNR for all test images. Furthermore, the improvement is the largest for the proposed algorithm as compared with other approaches.

( )

1

(

)

644

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1998

TABLE I QUANTIZATION TABLE USED IN

THE

EXPERIMENTS

TABLE II COMPARISON OF THE PERFORMANCE OF SD DEBLOCKING WITH OTHER CONVENTIONAL METHODS FOR JPEG ENCODED IMAGES WHICH ARE COMPRESSED WITH QUANTIZATION TABLE SHOWN IN TABLE I

[2] Y. Yang, N. P. Galatsanos and A. K. Katsaggelos, “Regularized reconstruction to reduce blocking artifacts of block discrete cosine transform compressed images,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 421–432, Dec. 1993. [3] S. Minami and A. Zakhor, “An optimization approach for removing blocking effects in transform coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, pp. 74–82, Apr. 1995. [4] Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, “Projection-based spatially adaptive reconstruction of block-transform compressed images,” IEEE Trans. Image Processing, vol. 4, pp. 896–908, July 1995. [5] S.-W. Hong, Y.-H. Chan, and W.-C. Siu, “The neural network modeled POCS method for removing blocking effect,” in IEEE Int. Conf. Neural Networks, vol. iii, Perth, Australia, Nov. 1995, pp. 1422–1425. [6] T.-C. Hsung, D. P.-K. Lun, and W.-Ch.Siu, “A deblocking technique for block-transform compressed image using wavelet transform modulus maxima,” IEEE Trans. Image Processing, to be published. [7] S. Mallat and W. L. Hwang, “Singularity detection and processing with wavelets,” IEEE Trans. Information Theory, vol. 38, pp. 617–643, Mar. 1992. [8] T.-C. Hsung and D. P.-K. Lun, “Denoising by singularity rejection,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 1, Hong Kong, June 9–12, 1997, pp. 205–208. [9] S. W. Wu and A. Gersho, “Nonlinear interpolative decoding of standard transform coded images and video,” in SPIE Conf. Image Processing Algorithms Tech. III, vol. 1657, San Jose, CA, Feb. 10–13, 1992, pp. 88–99.

Dynamic Bit Allocation in Video Combining for Multipoint Conferencing Ming-Ting Sun, Tzong-Der Wu, and Jenq-Neng Hwang

IV. CONCLUSION In this brief, we proposed a new algorithm for JPEG decoded image deblocking using the SD denoising technique. As with the WTMM deblocking approach, the new algorithm detects the singularity of an image to identify the unwanted wavelet coefficients which are introduced by the blocking effect. However, the new approach achieves this by a much simpler approach, i.e., by computing the sums of the wavelet coefficients inside the “directional cone of influence” in different scales of the image. The new algorithm has the advantage as the WTMM approach, in that it can effectively identify the edge and smooth regions of an image irrespective of the discontinuities introduced by the blocking effect. It improves over the WTMM approach in that only a simple inverse wavelet transform is required to reconstruct the processed wavelet coefficients to obtain the deblocked image. The new deblocking algorithm follows the approach of WTMM deblocking, which enhances the image in the senses of visual quality as well as SNR. It gives a performance consistently and substantially better than the traditional approaches. As compared with the WTMM deblocking algorithm, it requires much less computation effort in that the computation time can be reduced up to a few orders of magnitude.

REFERENCES [1] H. C. Reeve and J. S. Lim, “Reduction of blocking effect in image coding,” Opt. Eng., vol. 23, no. 1, pp. 34–37, Jan. 1984.

Abstract—For multipoint video conferencing over a wide-area network, a video bridge is often used to combine the coded video signals from multiple participants into a single video for display. The video combining can be done in the coded-domain by concatenating coded video bitstreams, or in the pixel-domain by transcoding. In this paper we compare the video quality using the two approaches for a multipoint video conference over a symmetrical wide-area network such as Integrated Service Digital Network (ISDN). We observe that in most multipoint video conferences usually only one or two persons are active at one time, while other participants are just listening with little motion. To achieve similar video quality for all the conference participants, the active persons require higher bit rates than the inactive persons. By doing video combining using the transcoding approach and dynamically allocating more bits to the active persons, we show that the achieved video quality can be much better than that can be achieved using the coded-domain video combining approach. Index Terms— Dynamic bit allocation, multipoint control unit, video combining, video conferencing.

I. INTRODUCTION Multipoint video conferencing is a natural extension of point-topoint video conferencing. With the rapid growth of video conferencing, the need for multipoint video conferencing is also growing [1]–[9]. Multipoint video conferencing technology involves networking, video combining, and presentation of multiple coded video Manuscript received October 15, 1997; revised February 25, 1998. This paper was recommended by Guest Editors F. Maloberti and W. C. Siu. The authors are with the Information Processing Laboratory, Department of Electrical Engineering, University of Washington, Seattle, WA 98195. Publisher Item Identifier S 1057-7130(98)03955-X.

1057–7130/98$10.00  1998 IEEE

Suggest Documents