ROBUST AND EDGE-PRESERVING VIDEO ERROR ... - CiteSeerX

71 downloads 1035 Views 283KB Size Report
packet erasures. In particular, we develop a technique for the re- plenishment of missing macroblocks, which aims at minimizing the impact of the lost data on the ...
ROBUST AND EDGE-PRESERVING VIDEO ERROR CONCEALMENT BY COARSE-TO-FINE BLOCK REPLENISHMENT S. Belfiore, L. Cris`a, M. Grangetto, E. Magli, G. Olmo Dipartimento di Elettronica - Politecnico di Torino Corso Duca degli Abruzzi 24 - 10129 Torino - Italy Ph.: +39-011-5644195 - Fax: +39-011-5644099 belfiore(crisa)@mail.tlc.polito.it grangetto(magli,olmo)@polito.it ABSTRACT In this paper we propose a novel error concealment algorithm for video transmission over wireless networks potentially subject to packet erasures. In particular, we develop a technique for the replenishment of missing macroblocks, which aims at minimizing the impact of the lost data on the resulting video with respect to the human visual system. The proposed algorithm operates three reconstruction stages at different scales, by first recovering smooth large-scale patterns, then large-scale structures, and finally local edges in the lost macroblock. Experimental results show that the proposed algorithm achieves improved visual quality of the reconstructed frames with respect to other state-of-the-art techniques, as well as better PSNR results. 1. INTRODUCTION AND STATE-OF-THE-ART The transmission of video signals in error-prone environments, such as wireless networks, represents a very challenging task. In fact, fading and network congestion may lead to packet erasures, which usually result into a number of missing macroblocks (MB) at the decoder. When a video decoder receives an incomplete stream, it may try to conceal the effect of data losses by estimating the missing data from the received ones. This problem is insofar important that plenty of video error concealment techniques have been recently developed; see [1] for an excellent survey. The most common concealment techniques use spatial and temporal interpolation, which attempt to estimate a lost MB from the neighboring ones in the same frame or in the adjacent frames respectively. Temporal interpolation is usually simpler, but it critically depends on the availability of the motion vectors; on the contrary, spatial interpolation represents an invaluable solution with quite general applicability. Error concealment by spatial interpolation has initially borrowed ideas from the image restoration field, trying to recover the unknown frame pixels by imposing spatial [2], and possibly temporal smoothness constraints on the resulting video. As an example, in [3] the interpolation problem is formulated in the DCT domain; reconstruction using projection onto convex sets has also been proposed for spatiotemporal [4] interpolation. More recently, it has been realized that smooth reconstructions may not be the best solution in terms of perceived quality of the recovered MB by The authors are with the Signal Analysis and Simulation group at Politecnico di Torino. URL: www.helinet.polito.it/sasgroup

the human visual system (HVS). In particular, smooth reconstructions do not restore edges and contours, which the HVS is highly sensitive to. Consequently, a large research effort has been devoted to devising edge-preserving replenishment algorithms. Examples are maximum a posteriori (MAP) estimation using Markov random fields (MRF) weighted by the Huber penalty function [5], or directional interpolation [6, 2, 7]. In particular, in [5] the block replenishment problem is recast in terms of a MAP estimation task, which can be conveniently approximated by the combined use of an averaging and a median filter. In [7] an adaptive MRF model is proposed, which exploits local gradient information in order to improve edge reconstruction. A common drawback of these algorithms lies in the fact that they do not overcome the trade-off between preserving edges as much as possible, and preventing the reconstruction of false structures that would be propagated across the neighboring frames due to temporal prediction. In fact, on one hand we have found that the algorithm in [5] provides fairly smooth reconstructions, which though tend to not propagate annoying visible artifacts in the adjacent frames. On the other hand, although the algorithm in [7] is generally able to reunite straight edges disconnected by a lost MB, it sometimes raises visible artifacts by propagating edges into smooth image regions, which become highly visible in the current and subsequent frames. This kind of problem is clearly due to the fact that all these algorithms work at a single scale, i.e. they attempt to recover all the macroblock at the same time. In this paper we propose a new spatial error concealment algorithm, which has been designed so as to optimize the estimation of the missing MB in terms of perception by the HVS. The algorithm performs coarse-to fine block replenishment (CFBR) by operating scene reconstruction at different scales. Firstly, smooth large-scale patterns, such as surfaces and illumination, are restored; secondly, large-scale structures are recovered; finally, local edge reconstruction is performed. The main contributions of this paper lie in 1) proposing the coarse-to-fine HVS-compliant framework, 2) complementing the use of previous results with original contributions (specifically the third, and most critical stage of the algorithm), and 3) defining a robust error concealment algorithm whose parameters are automatically computed. The resulting technique is shown to outperform other state-of-the-art techniques as to visual and objective quality. This paper is organized as follows. In Sect. 2 the CFBR algorithm is described in detail. In Sect. 3 experimental results are provided on some standard MPEG-2 coded video sequences, while

=12

in Sect. 4 conclusions are drawn. 2. PROPOSED ALGORITHM The proposed CFBR method attempts to estimate missing data by exploiting the spatial redundancy in the neighboring available pixels. The algorithm models the original image as a MRF, and computes its MAP estimate given the received image and a suitable prior model. Its operation consists of three stages. Firstly, smooth and large-scale structures are estimated, with the aim of providing a homogeneous reconstruction to be used as basis for further refinement. Secondly, large-scale structures are estimated, and a preliminary edge reconstruction is performed all over the missing MB; this has the purpose of providing a structured, albeit not very detailed description of the missing MB. Thirdly, local edge reconstruction is performed in order to refine edges; this step-by-step procedure turns out to be capable of yielding sharp reconstructions, nicely perceived by the HVS, as well as to avoid artifacts. Details on the three stages are given hereafter. The first-stage smooth reconstruction is obtained by means of N1 iterations of the algorithm in [5], which jointly uses an averaging and a median filter. A 16-pel clique is used here, as depicted in Fig. 1-a, which allows to maximize the amount of information used by the estimator.

000 111 000 111 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000111 111 000 000 111 000 111 111 000 000 111 000 111 i,j 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 000 111 000 111 000 111 000111 000 000 000 000 111 111 111111 111 000 111 000 111 000 111 000 111 000 111 111 000

5 3 000 111 000 111 000 111 000 111 000 000 7 111 6 111 4 111 2 111 1 111 000 000 111 000 000 111 000 000 111 000 000 111 000 000 111 000 111 111 000111 000 111 000 111111 000 111 000 111 000 111 000 111 000 111 8 000 111 000 111 i,j 111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 111 111 111 111 111 000 000 000 000 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

(a)



=

Macroblock

0

The second stage of the proposed CFBR method consists in applying N2 iterations of the edge-preserving algorithm in [7], in order to provide a reconstruction of large structures and edges that entirely cross the missing area. This method is also based on an MRF model of the image and MAP estimation. The estimated value of a pixel is given by

^ =

X 2X [

(k;l)

c

(k;l)

0

wi;j

c0

2[ c

!

k;l

wi;j c0

!

xk;l

k;l

where c is the clique and c is its complement, as shown in Fig. 1-b. The weights wi;j k;l are selected so as to weight more those pixels, belonging to c or c , lying along the directions of the most prominent edges. To determine the likelihood of edges in each of the eight equally spaced directions determined as in Fig. 1-b, a search window of eight pixels surrounding the missing macroblock is chosen. For each of the pixels belonging to this window the edge direction is determined by a gradient measure, in order to decide if this edge is aligned to the direction under consideration. Thus, the gradient at each pixel is evaluated in magnitude and phase (G and respectively). For each pixel of the missing MB,

[



2

(b)

Fig. 1. (a): the pixel to be estimated and its 16-pel clique for the first stage; (b): a pixel, its clique c, and the eight directions. The dark area is the complement c of the clique

xi;j

8

; ; : : : ; , is continuously updated for each a counter cm , m direction. For each pixel belonging to the search window, upon detection of an edge passing through the missing area, the counter for all pixels along that particular direction is incremented by the amount of G. Since the employed edge detector is sensitive to image noise, the values of cm are hard-thresholded, setting to zero those which are less than a given value  . Unlike in [7], in this paper the threshold selection rule is automatic, and is done according to the empirical formula  I2 = , where I2 is the global picture variance. The third and last stage, which is the novel and most characterizing part of the proposed CFBR method, works at a finer scale, providing the local reconstruction of details and small objects separately on each 8x8 subblock of the missing macroblock; to this purpose, N3 iterations of a modified edge-preserving interpolator are used, along with a boundary smoothing technique in order to avoid discontinuities among subblocks. In order to obtain local edge information, the missing macroblock and the related search window are split into four blocks, as shown in Fig. 2. In order to allow a controlled propagation among blocks, each search window is extended by two pixels in the direction of the other three areas. In order to optimize the use of available information in the edge recovery task, a different estimation order than in [7] is used: pixels in each block are scanned following an order that changes depending on pixel position within the MB, as shown in Fig. 2. Let us consider e.g. the lower-right block; pixels lying near the MB boundary are estimated first, so that the processing order is from the bottom-right to the upper-left pixel of the block. In a word, the processing order goes from the borders to the center of the MB, so as to optimize the use of available information.

! ]

0

I block

II block

III block

IV block

(a)

(b)

Fig. 2. (a) search window for a quarter of MB; (b) pixel estimation order As in other block-based techniques, MB splitting may raise discontinuities between the four blocks, especially if the search windows have very different contents. The blocking effect caused by such discontinuities would have a high impact on the HVS. However, this problem is overcome in the CFBR algorithm by substituting every couple of two pixels in each block, lying on either side of the discontinuity, with a local average, as shown in Fig. 3. For example, letting A, B , C , D be columns (resp. rows) of the recovered MB across a block boundary, as in Fig. 3, pixel of C and D are re-estimated as C

= A +2 D

D

= B +2 C

3. EXPERIMENTAL RESULTS The proposed CFBR method is general, and can be applied to any video compression method; in particular, in this paper MPEG-2

Vertical Average A

Horizontal Average

method achieves a significant performance improvement as for mean-squared-error.

B

11 00 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 C

1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

A C

111111111 000000000 000000000 111111111 000000000 111111111 000000000 111111111

Method D

B

D

Fig. 3. Local average across block boundaries

has been used as coding framework. In the following we present comparative results of three error concealment methods on two standard QCIF video sequences (namely mother and daughter and foreman), at a temporal resolution of 30 fps and encoded at 300 kbps. We assume that 15 % of the image MBs (of size 16 16) are lost as the result of network congestion. The CFBR algorithm , N2 and N3 . The rehas been run using N1 sults provided by the CFBR method are compared with those of other two intra-frame error concealment algorithms, namely those in [5] (using averaging and median filters) and in [7] (using an edge-preserving prior). As can be seen in 4-b and 5-b, the lost macroblocks have been mostly placed at the boundaries of piecewise smooth regions separated by sharp discontinuities, where the estimation is a more challenging task. As far as computational load is concerned, the overall complexity of the CFBR algorithm is slightly less than that of [7], which can be run in real-time on a desktop PC on QCIF sequences at 10 fps; the first N1 iterations can be neglected in the complexity evaluation. Let us consider the foreman sequence, which exhibits many sharp edges and small homogeneous areas. As can be seen in Fig. 4, the median filtering clearly provides poor edge recovery. The method in [7] yields sharp edge reconstruction; however, the connection between the reconstructed macroblocks and the surrounding areas is unpleasantly abrupt. Conversely, our proposed CFBR method provides sharp and nearly artifact-free reconstructions, thus significantly limiting error propagation due to temporal prediction. Moreover, Tab. 1 reports the PSNR values achieved by the three considered methods on the complete frame, showing that the reconstructions yielded by the proposed method are closer to the original in the mean-squared-error sense. Analogous considerations can be made for the mother and daughter sequence (Fig. 5); this sequence has a lesser amount of sharp edges, whereas most of its content consists of wide homogeneous areas and some textured areas. Since the HVS is highly sensitive to the contours between different smooth or textured regions, the method proposed in [7] yields a reconstruction of higher visual quality than that attained by the algorithm in [5]. The CFBR method offers the best of both worlds, providing smooth estimates of regular regions, and finely structured reconstructions of details, especially textured areas and small objects, such as the mother’s hear and her right eye. For the sake of completeness, in Tab. 1 the overall PSNR values for this frame are compared, clearly showing that, besides higher visual quality, the proposed CFBR



= 60

= 10

= 60

Table 1. PSNR comparison for the mother and daughter and foreman video sequences 15 % Block loss Foreman Algorithm in [5] 27.60 dB Algorithm in [7] 26.56 dB CFBR 28.82 dB Mother and daughter Algorithm in [5] 33.73 dB Algorithm in [7] 29.25 dB CFBR 34.84 dB

4. CONCLUSIONS In this paper we have proposed a novel intra-frame video error concealment algorithm, based on coarse-to-fine block replenishment. The goal of the CFBR algorithm is to minimize the negative effect of lost macroblocks with respect to the perception of the displayed video by the HVS. To this purpose, the algorithm estimates the lost region at three different levels of abstraction, namely smooth surfaces, structures, and local edges. Experimental results have shown that the resulting algorithm exhibits superior performance, from the standpoint of visual and objective quality, with respect to other state-of-the-art error concealment techniques. 5. REFERENCES [1] Y. Wang, S. Wenger, J. Wen and A. Katsaggelos, “Error Resilient Video Coding Techniques”, IEEE Signal Processing Magazine, pp 61-82, July 2000. [2] M.C. Hong, L. Kondi, H. Scwab, A.K. Katsaggelos, “Video error concealment techniques”, Signal Processing: Image Communication, vol. 14, nos. 6-8, pp. 473-492, 1999 [3] Z. Alkachouch, M.G. Bellanger, “Fast DCT-based spatial domain interpolation of blocks in images”, IEEE Transactions on Image Processing, vol. 9, n. 4, Apr. 2000, pp. 729-732 [4] G.-S. Yu, M.M.-K. Liu, M.W. Marcellin, “POCS-based error concealment for packet video using multiframe overlap information”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 8, n. 4, Aug. 1998, pp. 422-434 [5] P. Salama, N. B. Shroff, E. J. Delp, “Error concealment in MPEG video streams over ATM networks”, IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, June 2000, pp.1129-1144. [6] D.L. Robie, R.M. Mersereau, “The use of Hough transforms in spatial error concealment”, ICASSP 2000 - IEEE International Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, May 2000 [7] S. Shirani, F. Kossentini, R. Ward, “A concealment method for video communications in an error-prone environment”, IEEE Journal on Selected Areas in Communications, vol.18, no. 6, June 2000, pp.1122-1128.

(a)

(b)

(d)

(c)

(e)

Fig. 4. foreman sequence: (a) original, (b) corrupted, reconstructed using (c) algorithm in [5], (d) algorithm in [7], (e) CFBR method

(a)

(b)

(d)

(c)

(e)

Fig. 5. mother and daughter sequence: (a) original, (b) corrupted, reconstructed using (c) algorithm in [5], (d) algorithm in [7], (e) CFBR method

Suggest Documents