An edge and texture preserving algorithm for error

0 downloads 0 Views 254KB Size Report
algorithm for block-based video transmission over error- prone networks. In particular ... interpolation and texture analysis and synthesis, provid- ... or to neighboring MBs in the same frame, and is referred ... 2.1 Edge-preserving interpolation. In the first ... is noise-sensitive, the counters values are compared to a threshold ξ.
An edge and texture preserving algorithm for error concealment in video sequences S. Belfiore, L. Crisà, M. Grangetto, E. Magli, G. Olmo CERCOM - Center for Multimedia Radio Communications Dipartimento di Elettronica - Politecnico di Torino Corso Duca degli Abruzzi 24 - 10129 Torino - Italy Ph.: +39-011-5644195 - Fax: +39-011-5644099 belfiore(crisa)@mail.tlc.polito.it grangetto(magli,olmo)@polito.it

A BSTRACT In this paper we propose a novel error concealment algorithm for block-based video transmission over errorprone networks. In particular, we develop a spatial error concealment technique which combines edge-preserving interpolation and texture analysis and synthesis, providing a reconstruction of lost macroblocks optimized for visual perception. In particular, the algorithm recovers image edges by MAP estimation with a Markov random field prior, and replenishes lost textured areas with a texture synthesized from neighboring macroblocks. Experimental results show that texture synthesis allows to achieve improved visual quality of the reconstructed areas with respect to stand-alone state-of-the-art spatial concealment techniques. 1

I NTRODUCTION In many modern real-time multimedia applications, compressed video bitstreams are transmitted over packet networks; packet losses may occur due to network congestion or excessive packet delay at the receiver. As a consequence, most recent image and video encoders provide error resilience tools in order to avoid such losses. At the decoder, error concealment is also performed to attenuate the visible impact of residual missing data. Error concealment consists in exploiting the received parts of the bitstream to compute an estimate of the missing data (see [1] for a survey). In the video communication case, when a macroblock (MB) is lost, error concealment may be done by resorting to past/future frames or to neighboring MBs in the same frame, and is referred to as temporal and spatial concealment respectively. Joint approaches are also often used in practice. While the temporal approach is usually less time-consuming, it cannot be always applied; therefore spatial error concealment is an invaluable resource. As for spatial error concealment, several techniques have been proposed. The problem has been formulated in the DCT domain [2], where smoothness of the recovered MB can be easily imposed. Projection onto convex sets has also been suggested [3]. More recent approaches consider edge-preserving interpolators. Amongst others,

in [4] it is proposed to estimate edge directions by means of the Hough transform while in [5] an approach based on sketch theory is taken on, consisting in the recovery of contours followed by bilinear interpolation and patch repetition. Recent spatial error concealment techniques [6, 7, 8] attempt to improve visual quality by performing edgepreserving reconstruction of missing MBs with a Markov random field prior model. While the obtained results are fairly good, these algorithms, similarly to those cited above, provide very smooth reconstructions in image regions that do not contain edges. On the other hand, such edge-free regions often contain texture. As a consequence, the obtained reconstructions may exhibit a smooth recovered MB, surrounded by textured MBs; this discontinuity negatively impacts on picture perception by the human visual system. In this paper we propose an innovative approach to spatial error concealment, aimed at overcoming this problem, and consisting in the use of a texture analysis/synthesis technique coupled with an edge-preserving interpolator. In particular, the texture analysis approach detects those pixels of a missing MB that were supposed to originally contain texture, and synthesizes a new texture from the neighboring MBs that is used as a candidate for MB recovery. A merging stage blends the edge-preserving reconstruction and the texture in order to achieve the visually best reconstruction. It is worth noticing that, to the best of the authors’ knowledge, the use of texture analysis and synthesis algorithms for error concealment has not been proposed yet in the scientific literature. This paper is organized as follows. In Sect. 2 we describe the proposed error concealment algorithm, which consists of edge-preserving interpolation (Sect. 2.1) and texture analysis/synthesis (Sect. 2.2). In Sect. 3 we provide experimental results using the MPEG-2 coding scheme, while in Sect. 4 we draw some conclusions. 2

P ROPOSED ALGORITHM The proposed algorithm consists in two consecutive stages; the first one performs edge-preserving interpolation, while the second one provides the recovery of miss-

ing textured areas through a texture analysis and synthesis algorithm. At last, a merging algorithm blends the two reconstructions into the final MB estimate. Details about the two stages and the merging operation are given hereafter; a block scheme of the proposed algorithm is sketched in Fig. 1.

2.1

MAP  ,. which is stored for subsequent use by the merging step (see Sect. 2.2.3). 5 3 111 000 000 111 111 000 111 000 000 000 7 111 6 111 4 111 2 111 1 000 111 000 000 111 000 111 000 000 000 111 000 000 000 111 111 111 111 000111 000 000 000 000 111 111111 111 111 000 000 111 000 111 000 111 8 000 000 i,j 111 111 000 111 000 000111 111 000111 111 000111 000 111 000 000111 000 000111 111 000111 000111

Edge-preserving interpolation

In the first stage the image is modelled as a Markov random field (MRF), and its maximum a posteriori (MAP) estimate given the received image is computed. In particular, in this work edge-preserving reconstruction of the missing MB is performed using the algorithm in [6]. This algorithm aims at providing a reconstruction of large structures and edges that entirely cross the missing area. Its operation is briefly reviewed in the following, as it serves as a basis for the second stage and the merging rule. Each pixel of the missing MB is estimated as a weighted mean of the pixels belonging to the clique and its complement ¼ , as shown in Fig.2. The weights are selected so as to weight more those pixels, belonging to or ¼ , lying along the directions of the most prominent edges. To determine the likelihood of edges in each of the eight equally spaced directions determined as in Fig. 2, a search window of eight pixels surrounding the missing MB is chosen. For each pixel   belonging to this window the edge direction is determined by a gradient measure, in order to decide if this edge is aligned to the direction under consideration. Thus, the gradient at each pixel is evaluated in magnitude and phase ( and respectively) as follows: 



 





 

and

   



where

 





 ·½



½

 



½  

 

 ½    ·½·½ ½

 

 

 ½ ·½

·½  

and

 





·½

½

 



  ½    

 ½   ·½  ·½  ½ ·½ ·½ ½

 

 

 

For each of the eight directions associated to each pixel of the missing MB, a counter is updated, incrementing its value by the amount of  if an edge is detected in that particular direction; since the edge detection algorithm is noise-sensitive, the counters values are compared to a threshold  . In this work the threshold is automatically selected according to the empirical formula    ¾  , where ¾ is the global picture variance. If any edge is detected, this process is iterated until convergence. Otherwise, the considered pixel is assumed to belong to a homogeneous area, and is estimated as the mean of its neighbors. Each pixel of the final MB estimate is referred ´½µ to in the following as   . It is worth pointing out that that the computation of the edge map is done iteratively as the MAP estimation procounceeds. Upon termination of the algorithm, the ters are hard-thresholded so as to generate a binary edge

111 000 000 111 000 000 000 000111 111 000111 111 000111 111 000 000 000111 111 000000 111 111111 000111 000 111 000111 111 000000 111111 000111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 000 111 111

Figure 2: A pixel, its clique , and the eight directions. The dark area is the complement ¼ of the clique

2.2

Texture analysis and synthesis

The frame reconstructed by the first stage is the basis for the texture-based stage of the proposed error concealment algorithm. Three parts can be identified in this stage: firstly, a texture detection algorithm is applied in order to recognize which of the missing MBs is likely to contain textured areas; secondly, texture analysis and synthesis is performed; finally, the synthesized texture is merged with the edge-preserving reconstruction from the first stage, in order to provide the best MB estimate from the perceptive viewpoint.

2.2.1

Texture detection

The first step aims at inferring if some of the neighboring MBs of the missing one have a texture content compatible with the reconstruction provided by the edgepreserving interpolation. To this end, each of the eight neighboring MBs are progressively scanned. For each MB, the sum of absolute differences between the MB and the first stage reconstruction is computed. If the minimum of all sums is below a given value, this means that the content of that MB is similar to that of the first stage reconstruction. Next, that MB is checked for texture content, that is, its variance must be within a lower and upper bound, so as to prevent using smooth or edge regions in the texture synthesis task. If this condition is met, the following texture synthesis and merging steps are triggered, i.e. the MB providing minimum sum is used to synthesize a new texture, as detailed in the next section; this condition is labelled as   _ . Otherwise, the first stage reconstruction is retained as final MB estimate.

2.2.2

Texture synthesis

When the texture detection algorithm finds a neighboring MB of the missing one which has a texture content compatible with the reconstruction provided by the first stage, it performs texture synthesis in order to generate a new texture MB, which is the output of this stage. Each pixel of the final MB estimate is referred to in the follow´¾µ ing as   . As to the selection of a suitable texture analysis and synthesis algorithm, the following remarks can be made. In the application to QCIF and CIF image sequences, which are the most common formats for low bit-rate video coding, it turns out that only coarse textures are

Edge−preserving Reconstruction

MAP Merge Corrupted Frame

Estimation

Texture

Texture

Analysis

Synthesis

Edge

Edge

Detection

Map

Reconstructed Frame

Figure 1: Block scheme of the proposed algorithm usually present due to the low picture resolution. Moreover, the spatial extent of each texture is very limited, imposing to perform texture modelling on the basis of few image pixels, e.g. 256 or less; this obviously prevents from using statistical models for texture description. Instead, deterministic algorithms can be used, based on clever resampling of an original texture; in particular, in this work we have employed the algorithm described in [9], which is briefly reviewed in the following. The texture synthesis algorithm proposed by Efros and Freeman in [9] is a patch-based procedure; once an input texture sample block is selected, the synthesized image is created by putting together the patches so as to form a mosaic. Let us define the unit of synthesis  as a square block extracted from the set  of all possible blocks in the input texture sample. The synthesized image is obtained by tiling blocks  with a certain amount of overlap. The first block tiled onto the synthesized texture image is chosen at random. The others are selected from  in such a way that they are compatible with their neighbors along the region of overlap (see Fig. 3). This process is repeated until the synthesized image has been completely filled. If the two blocks ½ and ¾ overlap along their vertical border, and ½ and ¾ are the respective overlap regions, the overlap error metric is defined as   ½ ¾ ¾¾ ; at each step, the candidate block minimizing the overlap error is tiled onto the texture image that is being synthesized. The block size is a critical parameter: it must be big enough to capture the texture structure, but small enough so that the interaction among basic structures is left to the algorithm. Moreover, in applications such as low bit-rate video coding, that involves QCIF and CIF image sequences, textured areas are coarse and small, so that the block size value can change in a very limited range, which the MB size (16  16) is an upper bound to. In our experiments the block size was set to 8 and the overlap region width was 1/6 of the block size. In our application, in order to obtain a likely texture, a large image was synthesized and, then, a 16  16 block was cut out and sent to the merging rule.

Input texture B1

B2

Neighbouring blocks constrained by overlap

Figure 3: Pictorial description of the texture synthesis algorithm. The texture in the example has been extracted from the woman’s hair in the mother and daughter sequence.

2.2.3

Merging rule

Due to the existence of MBs containing both texture and edges, the second and third stage reconstructions must be blended in order to provide the final MB recovery   . The merging rule proposed here is a conservative one: by default, it selects the second-stage edgepreserving reconstruction, unless texture has been detected to a high degree of confidence. The merging algorithm is detailed in the following, and is run for each pixel of the considered MB. Recall that, at this point, the first and second stage reconstructions are available, as well as the edge map  provided by the first stage. 1. If   _ is not verified, then set  and exit; otherwise go to next step.

  ´½µ

2. If   the pixel is very likely to belong to an ´¾µ edge. As a consequence, set      and exit. 3. If

 

(a) if 

  ½ and ´¾µ  ¾ , then   

´½µ

´½µ



(b) else 

  ´¾µ

´½µ

 and ´¾µ are defined as



   ´¾µ

´½µ

  

´¿µ

     

´¾µ

´¾µ

´½µ

   

       





´¿µ

  

  

´¾µ

The definitions of  and  are empirical, and aim at checking whether the synthesized texture is compatible with the second stage reconstruction; we have found that these definitions give satisfactory results. In our experimental trials the thresholds  ½ and ¾ have also been selected empirically. 3

E XPERIMENTAL RESULTS The proposed algorithm exploits spatial redundancy in order to recover corrupted frames in sequences coded by a block-based compression algorithm. The MPEG2 standard has been selected as coding framework; in the following we present the results achieved by the proposed algorithm on two QCIF video sequences, namely mother and daughter and salesman, encoded at 300 kbps with a temporal resolution of 30 fps. We assume a loss of about 15% of the image MBs due to network congestion; we employ sparse MB loss patterns, as also done in [7], which can occur when slices of a single MB are formed, or MBs are interleaved before transmission. The results yielded by the proposed method are compared with those attained by the algorithm in [6], which is referred to as “directional MAP estimator", and is used as first stage of our proposed algorithm. Let us first consider the mother and daughter sequence in Fig. 4-a, which exhibits some sharp edges, wide homogeneous areas and some textured areas, such as the woman’s hair. The erased MBs are shown in Fig. 4b. As can be seen in Fig. 4-c, the algorithm in [6] yields a satisfactory reconstruction of homogeneous areas and sharp edges, while lost textured areas are replenished by a smooth reconstruction. This has a poor visual impact, since the discontinuity between the smooth recovered area and surrounding texture, which the human visual system is highly sensitive to, is apparent. Conversely, the proposed method yields a reconstruction of visual quality higher than that attained by [6], by synthesizing a textured area in some of the missing MBs visually similar to that present in their neighborhood; moreover, the texture synthesis does not impair the correct reconstruction of edges and homogeneous areas, as can be seen in Fig. 4-d, thanks to the pixelwise merging algorithm that blends the edge-preserving reconstruction and the synthesized texture. Analogous considerations can be made for the salesman sequence (Fig. 5); this sequence (see Fig. 5-a) has a large amount of shaded and blurred edges, and wide textured areas. The erased MBs are reported in Fig. 5-b. As can be seen in Fig. 5-c, even though edges and homogeneous areas are precisely recovered, for the algorithm in [6] smooth reconstructions of textured areas yield poor visual quality; this drawback is elegantly overcome by the proposed algorithm by means of insertion of the synthesized texture, as shown in Fig. 5-d.

As can be seen, the proposed algorithm is able to recognize areas where texture is not present, thus avoiding to impair the results of the first stage edge-preserving interpolator. Conversely, in case texture does exist, the achieved results are considerably better than those obtained by the first stage stand-alone. For completeness we report that, as to PSNR, the stand-alone edge-preserving stage achieves a PSNR of 34.05 dB and 26.90 dB in the mother and daughter and salesman sequences respectively, while the complete algorithm achieves 34.07 and 26.72 dB respectively. However, since the proposed algorithm optimizes visual perception and not PSNR, these values do not give any useful indication for the algorithm comparison. 4

C ONCLUSIONS In this paper we have proposed the use of texture analysis and synthesis as a part of spatial error concealment, with the goal of optimizing the visual appearance of recovered MBs. Experimental results have shown that a proper merging between a synthesized texture and an edge-preserving interpolation of each missing MB allows to obtain a considerable improvement in terms of visual quality with respect to applying a stand-alone edgepreserving algorithm. R EFERENCES [1] Y. Wang, S. Wenger, J. Wen and A. Katsaggelos, “Error Resilient Video Coding Techniques", IEEE Signal Processing Magazine, pp. 61-82, July 2000. [2] Z. Al Kachouh, M.G. Bellanger, “Exact DCT-based spatial domain interpolation of blocks in images", Proc. of IEEE ICIP’97. [3] H. Sun, W. Kwok, “Concealment of damaged block transform coded images using projection onto convex sets", IEEE Transactions on Image Processing, vol. 4, no. 4, pp. 470-477, Apr. 1995. [4] D.L. Robie and R.M. Mersereau, “The use of Hough transforms in spatial error concealment,” in Proc. of IEEE ICASSP 2000, Istanbul, Turkey, 2000. [5] L. Atzori, F.G.B. De Natale, “Error concealment in video transmission over packet network by a sketchbased approach", Signal Processing: Image Communication, vol. 15, no. 1-2, pp. 57-76, Sept. 1999. [6] S. Shirani, F. Kossentini, R. Ward, “A concealment method for video communications in an errorprone environment", IEEE Journal on Selected Areas in Communications, vol.18, no. 6, June 2000, pp. 1122-1128. [7] P. Salama, N. B. Shroff, E. J. Delp, “Error concealment in MPEG video streams over ATM networks", IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, June 2000, pp. 1129-1144.

(a)

(b)

(c)

(d)

Figure 4: mother and daughter sequence: (a) original, (b) corrupted, reconstructed using (c) directional MAP estimator, (d) the proposed method. [8] S. Belfiore, L. Crisà, M. Grangetto, E. Magli, G. Olmo, “Robust and edge-preserving video error concealment by coarse-to-fine block replenishment", to appear in Proc. of IEEE ICASSP 2002. [9] A. A. Efros, W. T. Freeman, “Image quilting for texture synthesis and transfer", Proc. of Siggraph ’01.

(a)

(b)

(c)

(d)

Figure 5: salesman sequence: (a) original, (b) corrupted, reconstructed using (c) directional MAP estimator, (d) the proposed method.

Suggest Documents