Postprocessing for very low bit-rate video compression - Image ...

10 downloads 0 Views 334KB Size Report
Abstract—This paper presents a novel postprocessing algorithm devel- oped specifically for very low bit-rate MC-DCT video coders operating at low spatial ...
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 8, AUGUST 1999

Postprocessing for Very Low Bit-Rate Video Compression John G. Apostolopoulos and Nikil S. Jayant

Abstract—This paper presents a novel postprocessing algorithm developed specifically for very low bit-rate MC-DCT video coders operating at low spatial resolution. Postprocessing is intricate in this situation because the low sampling rate (as compared to the image feature size) makes it very easy to overfilter, producing excessive blurring. The proposed algorithm uses pixel-by-pixel processing to identify and reduce both blocking artifacts and mosquito noise while attempting to preserve the sharpness and naturalness of the reconstructed video signal and minimize the system complexity. Experimental results show that the algorithm successfully reduces artifacts in a 16 kb/s scene-adaptive coder 112 pixels per frame and 5–10 for video signals sampled at 80 frames/s. Furthermore, the portability of the proposed algorithm to other block-DCT based compression systems is shown by applying it, without modification, to successfully post-process a JPEG-compressed image.

2

Index Terms—Blocking artifacts, JPEG, mosquito noise, MPEG, postprocessing.

I. INTRODUCTION Postprocessing is used to reduce the visual artifacts and improve the quality of compressed video. It is an important step for enhancing the performance of existing image/video compression standards and also forms an integral part in the design of future compression algorithms. The research reported in this paper, while applicable as an enhancement to existing standards, was primarily motivated by the Video over Plain Old Telephone System (VPOTS) project at AT&T Bell Laboratories [1]. The goal of the VPOTS project was to deliver natural-looking video over analog telephone lines and personal wireless links. The extremely low bit rates available over these links (approximately 8–40 kb/s) severely challenge even today’s most sophisticated compression algorithms. Therefore, postprocessing was investigated for improving the performance of the VPOTS coder. This work begins by briefly describing the prominent visual artifacts that afflict conventional image/video coders and the postprocessing algorithms previously proposed in the literature. The VPOTS coder and its difficult coding scenario is then described. The proposed postprocessing algorithm is then presented, and experimental results are given and discussed [2]. II. VISUAL ARTIFACTS AND GENERAL POSTPROCESSING APPROACHES A. Visual Artifacts The primary visual artifacts afflicting block discrete cosine transform based (block-DCT-based) image/video compression systems are blocking effects, mosquito noise, and loss of resolution. Blocking effects are due to discontinuities in the amplitude or slope of the reconstructed signal across the block boundaries. The blocking effect is an artificial structured artifact (square grid) that attracts the observers attention and is quite distracting. Mosquito noise is Manuscript received August 30, 1995; revised June 22, 1998. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Dan Schonfeld. J. G. Apostolopoulos is with Hewlett Packard Laboratories, Palo Alto, CA 94304 USA (e-mail: [email protected]). N. S. Jayant is with the Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250 USA. Publisher Item Identifier S 1057-7149(99)06005-4.

1125

typically seen when a sharp edge separating two uniform regions occurs within a block. The oscillatory basis functions of the BlockDCT are not very effective at representing the sharp edge, and when only a small number of DCT coefficients are coded the reconstructed block is littered with random noise or oscillatory distortion. The “noise” is accentuated because of its sharply defined square boundary. Loss of resolution is produced because only a small fraction of the coefficients can be coded. Typically, the perceptually more important low frequencies are coded while the high frequencies are discarded, yielding a reconstructed image or video that is more lowpass than the original. B. General Postprocessing Approaches Many conventional image and video compression standards (e.g. JPEG, H.261/3, MPEG-1/2) are based on the block DCT. Most of the postprocessing research has therefore focused on reducing block DCT artifacts, specifically blocking effects. Since blocking artifacts are primarily caused by discontinuities along block boundaries, initial efforts to reduce these artifacts involved smoothing (lowpass filtering) the pixels along these boundaries [3], [4]. This problem has recently been formulated as an image recovery problem, enabling many of the ideas and approaches from image restoration to be applied to reducing the compression artifacts [5], [6]. These approaches exploit more information (e.g. the specific quantization strategy employed) and therefore can achieve higher performance. However, they are typically more complex and much more coder-specific. Postprocessing has also recently been applied to subband filtering schemes [7]. III. THE VPOTS CODING SCENARIO The goal of the VPOTS project is to deliver natural looking video with only 8–40 kb/s. The input video is typically a simple headand-shoulders sequence acquired at 240 2 336 pixels/frame (CIF1) and downsampled to 80 2 112 pixels/frame (ninth-CIF or NCIF) to ease the coding. At the decoder, the reconstructed video is upsampled back to CIF resolution for display. The VPOTS coder is a MC-DCTbased coder employing an adaptive-codebook VQ to encode the DCT coefficients. Similar to most DCT-based coders, the VPOTS video is afflicted with blocking effects, mosquito noise, and loss of resolution. However, the low sampling rate of the VPOTS video, as compared to the feature size within the scene, makes postprocessing much more delicate because it is extremely easy to overfilter and blur the coarsely sampled image. In most previous research, the spatial resolution was significantly higher (e.g. about 30 times the number of pixels for a 512 2 512 image) and therefore heavy filtering produced minimal added distortion to the image. In contrast, excessive filtering in the VPOTS scenario could have drastic harmful effects on the resulting image quality. A further consideration involves the display of the reconstructed video; since the video is interpolated for final viewing, the effects of the interpolation process on the artifact visibility should also be examined. IV. PROPOSED APPROACH FOR POSTPROCESSING The goals for the proposed postprocessing algorithm are to eliminate the distracting blocking artifacts and mosquito noise while preserving the image sharpness and naturalness and minimizing 1 This resolution is slightly different from CIF, but we refer to it as CIF for convenience.

1057–7149/99$10.00  1999 IEEE

1126

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 8, AUGUST 1999

Fig. 2. The four five-point windows applied to the 1-bit edge map to determine if the pixel in question is a true edge or a false edge induced by distortion.

Fig. 1. The middle block is identified as exhibiting blocking effects while the surrounding blocks are undistorted. Filtering is performed to reduce the discontinuities that exist at the block boundaries without smoothing the surrounding blocks. A five-tap filter is applied horizontally and vertically to smooth the two pixels adjacent to each boundary.

the system complexity. The very low spatial resolution mandates pixel-by-pixel identification and processing in order to reduce the artifacts while preserving important image characteristics. This section presents the proposed postprocessing algorithm and specifically focuses on how to identify pixels afflicted by the different artifacts and how to reduce the artifacts while preserving the image quality. The primary goal was to maximize the performance of the VPOTS coder; however, we attempted to make the algorithm as generally applicable to block-based coders as possible. A. Reduction of Blocking Effects 1) Detecting Blocking Effects: In order to reduce the blocking effects, we first identify which blocks may exhibit these artifacts and then apply a lowpass filtering or smoothing operation. Since blocking artifacts result from signal discontinuities across the block boundaries, most conventional detection techniques are pixel-domain methods that search for discontinuities along the block boundaries. However, we found that a DCT-domain approach is more efficient and possibly more robust than the conventional approaches. Our simple DCT-domain detection method is motivated as follows. Blocking results when an inadequate number of DCT coefficients are used to represent the block. Empirically, this was found to occur when only about one to three coefficients are coded. Therefore, we propose to identify the blocks that potentially exhibit blocking artifacts by calculating the number of nonzero DCT coefficients in a coded block and comparing it to a threshold. This extremely simple routine has a number of advantages. First, it appears to be rather robust. For example, consider a block containing very few nonzero coefficients and exhibiting blocking effects. With the proposed method, the block is identified and filtering is applied to reduce the blocking effects. On the other hand, consider a false detection: the block has very few nonzero coefficients but does not exhibit blocking effects. In this case the block is likely to be rather smooth in which case further smoothing is not very harmful. Another advantage of this approach is its low computational complexity. This is most easily seen in a stillframe compression algorithm, such as JPEG. If the postprocessing is coupled with the decompression, all the nonzero DCT coefficients are already available and all that is required is counting the number per block (the number before each end-of-block marker). Based on

similar ideas, the complexity of a joint video decoder/postprocessor can be minimized. 2) Smoothing the Blocking Effects: The problem lies in successfully reducing the blocking effects without excessively filtering the image. Our approach is based on the assumption that the pixels within a potential problem block may be more distorted than the pixels in the surrounding blocks. Therefore, one should use the accurate or undistorted surrounding pixels to improve the estimate of the distorted interior pixels, without altering the surrounding accurate pixels. This is equivalent to applying a filter along the boundaries but only updating the pixels within the distorted block. For example, horizontal lowpass filtering may be applied to reduce the discontinuity along the distorted block’s left and right boundaries, but only the pixels within the block are actually updated. The pixels in the surrounding blocks are left untouched. Similarly, a vertical lowpass filter is applied to reduce the discontinuity along the top and bottom edges. This technique is illustrated in Fig. 1 where a five-tap filter is applied to smooth the two pixels along the left, right, top, and bottom boundaries. The optimum choice of smoothing filter is an open question, along with the optimum region of smoothing. A linear lowpass filter with zero phase, minimum ROS, and maximum smoothing would appear to provide the desired results. A five-tap mean filter has been applied with good results.

B. Reduction of Mosquito Noise 1) Identifying Pixels with Mosquito Noise: The task of identifying pixels afflicted by mosquito noise is quite difficult. However, the extremely low bit rate in VPOTS produces distortion throughout each frame, and this leads to the idea of smoothing all the pixels that do not need to be preserved, i.e., all nonedge pixels. Since texture-type areas are not represented accurately to begin with, smoothing them should not cause harmful side effects, and actually may be beneficial. The proposed algorithm is therefore based on the simple hypothesis that nonedge pixels may contain mosquito noise and should be smoothed. Edge pixels also contain mosquito noise, however any filtering of the edge pixels produces an unacceptable amount of blurring and loss of image sharpness. Furthermore, the edge distortion is typically masked by the edge itself. Therefore, the edge pixels must be carefully identified and preserved, and the remaining nonedge pixels can be safely smoothed to reduce the mosquito noise. A pixel-based edge detection algorithm was used to create a binary edge map which estimates each pixel as an edge or nonedge pixel [8]. This algorithm was chosen because it was designed for noisy images and it was successfully applied in a previous postprocessing project [7].

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 8, AUGUST 1999

1127

Fig. 3. The complete postprocessing algorithm. Interpolation is assumed to follow the postprocessing.

One problem with the edge detection process is that large amplitude distortions, such as mosquito noise,2 may be falsely detected as edges thereby evading the smoothing process. To counteract this problem, it is essential to distinguish between true edges and false, distortioninduced edges. The approach that we developed is based on the idea that a true edge has a spatial extent: a true edge will have a string of connected edge pixels, while distortion-induced false edges are typically isolated edge-points. The connectivity of each potential edge pixel is examined by applying four five-point windows to the edge map, centered at the pixel in question and aligned along the horizontal, vertical, and diagonal directions as illustrated in Fig. 2. If the number of edge pixels along any of the directions is greater or equal to three, an edge is determined to exist along that direction and the pixel in question is assumed to correspond to a true edge. Otherwise it is assumed to be a false edge. This “smoothed” edge map is a more accurate indicator of the true edges in the image.3 2) Smoothing the Mosquito Noise: The false edges correspond to pixels with very high distortion (since they were falsely detected as edges) and therefore are heavily smoothed to ensure that they do not degrade the postprocessed image. The detected true edge pixels are passed through the system unprocessed in order to retain the image sharpness. Experiments have shown that it is also beneficial to pass unprocessed the pixels on the top, bottom, left, and right of each edge pixel, i.e. each edge pixel should be unprocessed as well as its four nearest neighbor pixels. All the remaining pixels are smoothed to reduce the distortion. There are many possible techniques for smoothing, and we developed two general filtering “rules of thumb” that have been extremely important for processing at very low spatial resolutions. First of all, the filter ROS should exclude any neighboring edge pixels and pixels on the other side of the edge. Any edge pixels used in the filtering will result in significant blurring of the image.4 Second, it is generally better to have a smaller ROS than a larger ROS. A number of linear and nonlinear filters were considered for the smoothing operation; a 3 2 3 double median filter was chosen. This filter works by first taking the 3 2 3 median for each pixel in the 3 2 3 window, and then taking the median of all these median 2 It is interesting to note that while blocking artifacts are highly-visible and distracting, the discontinuities are so small so as to virtually never be detected by the edge detection algorithm! 3 Since the edge detector output contains many false edges, and the key to the working system appears to be a smoothing of the edge map, similar performance may be achieved with a much simpler edge detection algorithm. 4 For example, if the edge pixels were included in the filter ROS and a small 3 3 mean filter was applied to the nonedge pixels, a totally unacceptable amount of blurring results. The severity of this blurring is evident when we realize that even this small 3 3 filter ROS (the smallest possible symmetric filter aside from 1 1) still covers 2.7% horizontally and 3.7% vertically of the image’s ROS. Filtering is a very sensitive issue: Replacing the mean filter by a more conventional lowpass filter such as [ 14 21 41 ]> [ 14 21 41 ] results in virtually no smoothing of the distortion at all.

2

2

2

values; hence the name double median filter. The double median filter produces a greater degree of smoothing than the conventional median filter, however its advantage over other possible filters is unclear. An important point is that because the ROS of the smoothing filter is variable (since all the edge pixels are excluded), it is more convenient to use mean or median type filters because their evaluation does not change significantly when their ROS changes. C. Postprocessing and Interpolation

The VPOTS video is coded at NCIF (802 112 pixels/frame) but is displayed at CIF (240 2 336 pixels/frame) resolution. This raises the issue of whether postprocessing should be performed before, after, or during interpolation. Artifact identification and reduction may be performed in the same or different domains, e.g., identification at NCIF and smoothing at CIF. A brief study examined some of these options; however, various tradeoffs exist and no fixed set of options consistently outperforms the others. For example, processing at NCIF appears to reduce slightly more artifacts and also yields a slightly sharper image, while processing at CIF yields a slightly smoother and more natural-looking image. The complexity depends on where the processing is performed, and is proportional to the pixel rate: processing at CIF requires almost an order of magnitude more computations than processing at NCIF. Therefore, we primarily examined postprocessing at NCIF followed by interpolation to CIF. D. Complete Postprocessing Algorithm The complete postprocessing algorithm is illustrated in Fig. 3. A DCT-domain block detector generates a map of the blocks potentially exhibiting blocking artifacts. This map guides the filtering of appropriate block boundaries. An edge detection algorithm generates a map of the important edges in the image. This map is smoothed or cleaned to distinguish between the true edges in the image and the false distortion-induced edges. The true edges and their nearest neighbors are left unprocessed in order to preserve the image sharpness. The false edges are heavily smoothed, and the remaining pixels are moderately smoothed. The identification of the edges, blocking effects, and mosquito noise should all be performed before any smoothing is applied. However, it is not evident whether it is better to first filter the block boundaries or to first smooth the nonedge pixels. Interpolation is performed by the separable application of an 11-tap sharpened Gaussian filter. V. EXPERIMENTAL RESULTS

A. Application of Proposed Algorithm to VPOTS The proposed postprocessing algorithm was applied to a number of VPOTS-coded video sequences. We examine two frames from the Jelena sequence which are representative of typical postprocessing

1128

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 8, AUGUST 1999

Fig. 4. Left: VPOTS output frame that exhibits relatively minor distortion (mild mosquito noise). Right: postprocessed VPOTS output.

Fig. 5. Left: VPOTS output frame that exhibits considerable distortion (significant blocking effects and mosquito noise). Right: postprocessed VPOTS output.

results. Frames 1002 and 1248 of the Jelena sequence, coded with VPOTS at 16 kb/s, are shown in Figs. 4 and 5, respectively. The first frame exhibits relatively minor distortion (mild mosquito noise), while in the second frame the distortion is quite large (significant blocking effects and mosquito noise). Figures 4 and 5 compare the VPOTS output without and with postprocessing. Notice that the algorithm significantly reduced the mosquito noise in frame 1002 as well as both the mosquito noise and blocking effects in frame 1248. Of course, any blocks that were extremely distorted (such as in the facial area of frame 1248) remained so after postprocessing. These blocks were smoothed, but the extreme distortion limited the improvement. Overall, however, the authors believe that a considerable improvement in visual quality is achieved for distorted frames. In addition, compressed frames that exhibit little distortion (blocking effects and mosquito noise) are passed through the postprocessor virtually untouched. Fig. 6 illustrates the different information that was identified in frame 1248, including the estimated true edges, the false distortioninduced edges, and the blocks potentially exhibiting blocking effects. The authors find this composite map of all the identification information very useful for analyzing the algorithm. B. Further Example: JPEG An algorithm would be particularly useful if it was applicable and successful over a wide range of image/video coders without requiring fine-tuning of parameters based on the specific coder. To examine the portability of the proposed postprocessing algorithm, it was applied without modification to a JPEG-compressed image, i.e., the proposed

Fig. 6. Composite map of all the identification information produced by the postprocessing algorithm. The estimated true edges are shown in white, false distortion-induced edges are light gray, and blocking effects are dark gray. These attributes are easier to see when mapped to red, green, and blue color components.

algorithm was applied to the JPEG-coded image with an identical setup to that used for VPOTS. The Lena image was coded at a quality level of ten, thereby producing a significant amount of distortion. The results are shown in Fig. 7. The blocking artifacts are significantly reduced and the overall visual appearance is greatly improved. Some minor fine tuning may further improve the results. For instance, the

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 8, NO. 8, AUGUST 1999

1129

Fig. 7. Left: JPEG-compressed Lena image. Right: postprocessed result.

Lena image is of a much higher spatial resolution than the VPOTS video, therefore heavier lowpass filtering may be applied to smooth the blocking without blurring the image. VI. SUMMARY

AND

FUTURE WORK

This work presented a postprocessing algorithm for improving the visual quality of block-DCT-based image and video coders. The algorithm was developed to reduce unwanted blocking effects and mosquito noise artifacts, while preserving the image sharpness and naturalness. The algorithm was relatively simple, yet it performed quite well under different circumstances. The work presented here was in its early stages; a number of important issues remain that should be addressed. First, the temporal aspects of a video signal have not been exploited. Temporal processing may lead to a more natural smoothing and reduction of artifacts. Postprocessing after interpolation to CIF sometimes provides more natural video; achieving this quality while processing at NCIF would be highly beneficial. Finally, a joint postprocessing/interpolation scheme may provide improved visual performance while reducing the computational requirements.

compressed images,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, pp. 421–432, Dec. 1993. [7] T. S. Liu and N. S. Jayant, “Adaptive postprocessing algorithms for low bit rate video signals,” IEEE Trans. Image Processing, vol. 4, pp. 1032–1035, July 1995. [8] A. Kundu and S. K. Mitra, “A new algorithm for image edge extraction using a statistical classifier approach,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, no. 4, pp. 569–577, July 1987.

Image Enhancement Based on Signal Subspace Approach Ki-Seung Lee, Eun Suk Kim, Won Doh, and Dae Hee Youn Abstract—This paper describes a block-by-block basis image enhancement algorithm which uses the signal subspace method to enhance images corrupted by uncorrelated additive noise. The enhancement is performed by eliminating the noise components in the noise subspace and estimating the clean image from the remaining components in the signal subspace. Index Terms— Adaptive Wiener filtering, image enhancement, signal subspace paradigm.

REFERENCES [1] J. Hartung, “Dynamic bit allocation in the scene adaptive video coder,” AT&T Intern. Memo, Mar. 28, 1994. [2] J. Apostolopoulos and N. Jayant, “Post-processing for very-low-bit-rate video compression,” AT&T Tech. Memo., 1994. [3] H. C. Reeve and J. S. Lim, “Reduction of blocking effects in image coding,” Opt. Eng., vol. 23, pp. 34–37, Jan./Feb. 1984. [4] B. Ramamurthi and A. Gersho, “Nonlinear space-variant postprocessing of block coded images,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 1258–1268, Oct. 1986. [5] A. Zakhor, “Iterative procedures for reduction of blocking effects in transform image coding,” IEEE Trans. Circuits Syst. Video Technol., vol. 2, pp. 91–95, Mar. 1992. [6] Y. Yang, N. P. Galatsanos, and A. K. Katsaggelos, “Regularized reconstruction to reduce blocking artifacts of block discrete cosine transform

I. INTRODUCTION Some of the classical approaches to enhance images degraded by noise include spatial lowpass filtering [1], neighborhood averaging Manuscript received September 24, 1996; revised October 21, 1998. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Stephen E. Reichenbach. K-S. Lee is with AT&T Laboratories-Research, Florham Park, NJ 079320971 USA (e-mail: [email protected]). E. S. Kim, W. Doh, and D. H. Youn are with the Center for Signal Processing Research, Department of Electronic Eng., Yonsei University, Seoul, 120-749, Korea. Publisher Item Identifier S 1057-7149(99)06114-X.

1057–7149/99$10.00  1999 IEEE