Optimization of Quadtree Segmentation and Hybrid 2D ... - CiteSeerX

1 downloads 0 Views 451KB Size Report
Abstract. A rate-distortion framework is used to define a very low bit rate coding scheme based on quadtree segmentation and optimized selection of motion ...
Optimization of Quadtree Segmentation and Hybrid 2D and 3D Motion Estimation in a Rate-Distortion Framework  Dimitrios Tzovaras, Stavros Vachtsevanos and Michael G. Strintzis, Senior Member, IEEE Electrical and Computer Engineering Department Information Processing Laboratory Aristotle University of Thessaloniki Thessaloniki 54006, Greece phone: (+30-31) 996-359, fax: (+30-31) 996-398 e-mail : [email protected]

Manuscript originally submitted in 31/8/96 and revised in 28/2/97 Abstract

A rate-distortion framework is used to de ne a very low bit rate coding scheme based on quadtree segmentation and optimized selection of motion estimators. This technique achieves maximum reconstructed image quality under the constraint of a target bitrate for the coding of the vector eld and segmentation information . First, a complete scheme is proposed for hybrid 2D and 3D motion estimation and compensation. The quadtree object segmentation is optimized for hybrid motion estimation in the rate distortion sense. This scheme adapts to the depth of the quadtree and the technique used for motion estimation for each leaf of the tree. A more sophisticated technique, adapted to the requirements of a very low bit rate coder is also proposed which considers also the transmission of the prediction error corresponding to the particular choice of the motion estimator. Based on these coding schemes two versions of a very low bit rate image sequence coder are developed. Experimental results illustrating the performance of the proposed techniques in very low bit rate image sequence coding application areas are presented and evaluated.

Keywords: Very low bit-rate coding; rate-distortion theory; quadtree segmentation; hybrid 2D and 3D motion estimation.

This work was supported by the EU CEC Projects PANORAMA (Package for New Autostereoscopic Multiview Systems and Applications, ACTS project 092). and VIDAS (Video Assisted with Audio Coding and Representation, ACTS project 057). The assistance of COST 211ter is also gratefully acknowledged. 

1

1 INTRODUCTION The transmission of full motion video through limited capacity channels is critically dependent on the ability of the compression schemes to achieve target bit rates while still maintaining acceptable visual quality [1]. In order to achieve this, motion estimation and motion compensated prediction are frequently used, so as to reduce temporal redundancies in image sequences [2]. Block-based motion estimation techniques have been extensively studied and applied for very low bitrate coding [3]. However, the performance of these techniques in such low bitrates is restricted by well known limitations such as the block and mosquito artifacts. Object-based techniques for image sequence coding [4, 5, 6, 7, 8] have been proposed to solve these problems. Ane 2D and 3D motion estimation models are used for motion compensation in object-based techniques. While much attention has been devoted to the coding of the intraframe and prediction error images, the displacement vector elds or the parameters of the motion models are usually coded losslessly using DPCM/Hu man coding resulting in limited compression. The reason for this is that digital video coding systems for many applications have at their disposal rates ranging from 1 Mbit=sec to 25 Mbits=sec. At such rates, only a minor part of the global rate is devoted to the transmission of the motion information, hence the bitrate overhead produced by lossless encoding of the vector elds or motion model parameters is negligible. In many emerging application areas however, lossy compression of the vector elds is often highly desirable, and sometimes unavoidable. For example, mobile videophone or multimedia transmission channels are often limited to capacities of 4.8 - 64 kbps. In such cases, it is clearly desirable to reduce as much as possible the bitrate needed to transmit the motion vector elds, provided that this reduction does not produce intolerable distortion in the reconstructed image. It is also desirable to allocate the bitrate devoted to the coding of motion elds adaptively, depending on the complexity of the sequence and also on the overall bitrate availability when the latter varies with time. Furthermore, it is desirable to select both the image segmentation into objects and the motion estimation method for each object, adaptively so as to best represent the motion of each part of the image. For example, in a typical videophone sequence rough object subdivision combined with a block-based or ane 2D motion estimation model would suce for the description of the motion of most parts of the foreground object while much ner object subdivision perhaps down to the size of single blocks and more sophisticated 3D motion models would be best suited for the description of mouth and eye motion. 2

An elegant framework for the de nition of such a strategy is provided by the classical ratedistortion constrained minimization procedure. This has been recently used in many coding applications including bit allocation for vector quantization [9], wavelet packet image coding [10] quadtree still image coding [11] and generic video compression [12]. In [13] the rate distortion function was evaluated for image sequence coding under the assumption of Gaussian intensity distribution. Recently, rate-distortion optimization was also used for the development of ecient motion and disparity estimation strategies [14, 15]. In this schemes a rate-distortion framework is used to de ne a displacement vector- eld estimation technique for use in video coding. The present paper investigates the use of this methodology for quadtree segmentation and hybrid motion eld estimation under the constraint of a target bitrate for the coding of the vector information. Quadtree segmentation is performed using rate-distortion criteria and is fused with motion estimation by selecting for each node of the quadtree, the optimum motion estimator from a predetermined set of candidate motion estimators. As an extension, the rate-distortion optimization scheme was used also to optimize the allocation of the prediction error corresponding to the motion estimation procedures in the transmitted information. Also, two possible codecs are proposed and evaluated experimentally : In the rst, (Rate-distortion optimized hybrid codec, RDHC) the image sequence is divided into groups (GOF) of ten frames and rate-distortion optimization is directed to each GOF separately. The rst frame of each GOF is transmitted as a still image (intra-coded) and the succeeding frames are coded using motion compensation from the reconstructed version of the previous frame (see Figure 1). The prediction error is not transmitted. In the second (Rate-distortion optimized hybrid codec with error transmission, RDHCE), the optimization algorithm is applied to a much longer sequence of (up to 100) frames (see Figure 2). In this case the rst frame is coded as a still image and all other frames are coded using rate distortion optimization of the quadtree segmentation, the motion estimation and additionally the prediction error transmission. The paper is organized as follows. The hybrid technique used for motion estimation is described in Section 2 and a brief review is given of each candidate technique. The determination of the optimal quadtree segmentation based on rate distortion optimization for the identi cation of the optimal quadtree and the optimal motion estimator for each leaf of the quadtree is described in Section 3. Also in Section 4 the proposed technique is extended to include the transmission of the prediction error corresponding to the motion compensated estimates. Finally, experimental results demonstrating the performance of the proposed algorithm for the coding of typical videophone and videoconferencing sequences are given in Section 5 and conclusions are drawn in Section 6. 3

2 HYBRID MOTION ESTIMATION Several schemes have been proposed in the literature for the coding of videophone or videoconference image sequences [1, 2, 3]. Motion estimation and compensation is the basic approach used in all these schemes. Modeling of the motion information by translation, zoom and pan, or a 3D rotation and translation, has been used in block-based, ane and 3D motion estimators. Experimental results have shown that ane 2D motion or 3D motion models may represent eciently the displacement occuring in typical scenes ; however most parts of the image may be coded very satisfactorily using only translational motion (e.g. the background). Moreover the complexity of the ane and 3D motion estimation algorithms is higher than the complexity of the block-based scheme. Based on the above observations we propose the use of all these models for the motion compensated coding of the objects of a scene, within a rate-distortion framework optimizing both the segmentation and the motion estimation (see Figure 3). The alternatives are : 1. The motion of the object is insigni cant. No motion vector is transmitted and the previous estimate for this frame is considered sucient. 2. Translational motion is used to compensate the motion of an object. A two-component motion vector is transmitted. 3. An ane 2D motion model is used to represent the motion of an object The six model parameters are transmitted. 4. A 3D motion model represents best the motion of an object in the scene. The eight motion model parameters are transmitted. 5. The 3D motion corresponding to the same block in the preceding in time frame is used. In other words, the optimum image segmentation together with the optimum of the above motion estimator candidates are selected so as to minimize a distortion index subject to a ceiling on the available rate. Classical DFD (Displaced Frame Di erence) minimization de nes the block-based motion estimator. To de ne the remaining motion estimator candidates, a brief review is given below of the ane and the 3D motion estimation methods [16, 17].

4

2.1 Ane 2D Motion Estimation The general representation of an ane transformation is 2

a11 a12 0 6 [x y 1] = [u v 1] 4 a21 a22 0 a31 a32 1

3 7 5

;

(1)

equivalently,

x = a11u + a21v + a31 ; y = a12u + a22 v + a32 : If w = (a11; a12; a13; a21; a22; a23) is the vector of the motion parameters, the following system of equations must be solved for each object in the scene [16] :

Aw = b ; where

"

#

A = 0 0 ;

and

2

X1 Y1 1

= 64 ... and

.. .

.. . XN YN 1

3 7 5

;

b = bx by T = [X1 + dx1; : : :; XN + dxN ; Y1 + dy1; : : :; YN + dyN ]T ;

where N is the number of points with coordinates (Xi, Yi ) in the working object. The solution to the above overdetermined set of equations may be obtained by use of a leastsquares method, or alternately by the robust least median of squares technique described in details in [18].

2.2 3D Motion Estimation In order to identify the objects in the scene, the original image is segmented into areas having uniform motion characteristics. The 3-D motion of each object in the scene is modeled using a six-parameter model. More speci cally, we assume that if (x(t); y (t); z (t)) are the coordinates of a point at time instant t, its coordinates (x(t ; 1); y (t ; 1); z (t ; 1)) at time instant t ; 1, are given by 2 6 4

3

2

x(t ; 1) 7 6 1 ;wz wy y (t ; 1) 5 = 4 wz 1 ;wx z (t ; 1) ;wy wx 1 5

32 76 54

3

2

x(t) 7 6 tx y (t) 5 + 4 ty z (t) tz

3 7 5

;

(2)

where three translational parameters (tx , ty , tz ) and three rotational parameters (wx , wy , wz ) are used to describe the motion of the underlying object. The goal of the 3-D motion estimation procedure is to compute the motion parameter vector (wx, wy , wz , tx , ty , tz ) for each object in the scene. If (X; Y ) are the coordinates of the perspective projection of the 3-D point (x(t); y (t); z (t)) on the image plane at time t, then : (3) X = f xz((tt)) and Y = f yz ((tt)) : From (2),(3) the 2-D motion vectors vm (X; Y ) that correspond to the pixels (X; Y ) of each object are de ned by projection of the 3-D motion on the 2-D image plane, as follows:

vmx(X; Y ) = X (t ; 1) ; X (t) = f ;Xw;Xw+z Yw+Yfw+yf++ftftx =z=z((t)t) ; X ; y x z w X + Y ; fw + ft z x y vmy (X; Y ) = Y (t ; 1) ; Y (t) = f ;w X + w Y + f + ft =z=z((t)t) ; Y : y x z

(4)



If tzz J~2i+1 j () + J~2j ;1 () + J~2j ;2 () + J~2j ;3 ()

then set split(nij ) = 1 and i+1 i+1 i+1 D~ ji = D2i+1 j + D2j ;1 + D2j ;2 + D2j ;3

i+1 i+1 i+1 R~ij = Ri2+1 j + R2j ;1 + R2j ;2 + R2j ;3 i+1 i+1 i+1 J~ji = J2i+1 j + J2j ;1 + J2j ;2 + J2j ;3

(12)

 Step 4 Go to Step 2.  Step 5 Starting from the root node n0 and using in a linked-like fashion the node data-structure element split (nij ), selected optimally for all the nodes of T , construct the optimal quadtree S ?() and its associated optimal motion estimator set choice M ? ().

3.3 Determining the optimal slope  First two values l < u of  are found so that Ns X i=1

R(vi(u ); si(u))  Rbudget 

Ns X i=1

R(vi(l); si(l)) :

Note that the initial segmentation S (u) of the rst frame of the image sequence, selects the whole image to be a single object, while for the subsequent frames S (u ) is the optimal segmentation corresponding to the previous in time frame. Similarly, S ((l) corresponds to the segmentation resulting by full splitting of the quadtree until the minimum allowed object (block) size is reached. For the coding of a sequence of frames, the values of l, u are chosen to be l = 0, u = 1 for the initial frame and l = 0:8, u = 1:2 for subsequent frames, where  is the solution of (10) for the previous frame. The bracketing interval is then successively decreased in size by the following procedure :

 Step 1 For each object i, i = 1; : : :; Ns, compute D(vi(l); si(l)) and D(vi(u); si(u)) and the corresponding R(vi(l ); si(l)) and R(vi(u ); si(u )).  Step 2 Set

P Ns i=1 PN s



[D(vi(l); si (l)) ; D(vi(u ); si(u ))] +  new = i=1 [R(vi(l); si(l )) ; R((u); si(u ))] where  is a vanishingly small positive number. 9

 Step 3 Compute the fR(vi(new ); si(new ))gi and fD(vi(new ); si(new ))gi minimizing J () for  = new .  Step 4 If PNi=1s R(vi(new ); si(new )) = PNi=1s R(vi(u); si(u)), then stop,  = u P Else if Ni=1s R(vi(new ); si(new )) > Rbudget , l

Else u

new . Go to step 2.

new . Goto step 2.

Note that the distortion corresponding to each motion vector in a speci c search area is computed only once, at the rst iteration of the algorithm. Thus the computational load of the algorithm consists of updating the entropy of the vector eld and nding the minimum J ().

3.4 Computation of the Entropy and Distortion Functions The speci c way the vector eld a ects the quality of the reconstructed image will determine the distortion index D(vi(); si()). A number of such distortion measures have been proposed in the literature. In case of quadtree-based segmentation, the simplest and most commonly used is the temporally displaced frame di erence

D(vi ; si) =

by bx X X

k=0 l=0



imt(m + k; n + l) ; imt;1 (m + k + vx(i); n + l + vy(i)) ;

where (m; n) are the upper left hand corner coordinates of block i corresponding to node n, imt(), and imt;1 () is the image at time instant t and t ; 1, respectively, (vx(i); vy(i)) is the projected 2D motion vector corresponding to the motion estimation method used and bx; by are the dimensions of the working object (block). Also the transmission cost R(vi(); si()) will depend on the speci c method used for the coding of the vector elds. The motion parameter vectors corresponding to either the ane or the 3D motion estimation methods are rst quantized uniformly and the corresponding entropy is thus computed with respect to the quantized parameters. The distortion is also computed based on the quantized motion parameter vectors. The entropy of the current node, R(vi(); si()), is computed by summing the entropy of the already coded motion or motion parameter vectors with the entropy of the split ij bit and the entropy of the parameter indicating which motion estimator is chosen for the already coded quadtree objects. In the present work, the use of entropy coding (e.g Hu man or arithmetic coding) is assumed, with an adaptive probability model, for the computation of the entropy of each component of the motion or motion parameter vectors. Thus, the entropy Rd(vi (); si()) for the coding of the component d 10

of vi computed using a speci c motion estimator, (i.e. x- or y- component of the 2D motion vector eld in the case of block-matching motion estimation, or a parameter of the quantized motion vector eld in the case of ane or 3D motion estimation) is computed as

Rd(vi(); si()) = ;

vdmax X x=;vdmin

px (vi())log2(px(vi()) ;

where px (vi()) is the probability that the vector eld minimizing the index J (vi (); si()) satis es vd(i) = x, and vdmin , vdmax, are the minimum and maximum allowed values for the speci c component d of the motion or motion parameter vector. The probability px(vi ()) is computed for each operating point (vi(); si()) of the algorithm using the information of all previously encoded parameters corresponding to the speci c motion estimator as follows :

px (vi()) = 1i where

i X k=1

dx(vk()) ;

(13)

(

1 if (vd(k)() = x) : 0 otherwise Note that (13) is equivalent to the following ecient formula for the incremental computation of px (vi()) : px(vi+1 ()) = i +i 1 px (vi()) + dx (vi+1()) : (14)

dx(vk ()) =

A more computationally ecient approach, which does not involve incremental computation of the probability density of the vector eld or the rst order vector eld di erences is to assume a model for this probability density function. Speci cally, the assumption of Gauss-Markov Random Field to describe motion [21, 22] vector di erences could be used so as to accelerate the rate-distortion minimization procedure.

4 TRANSMISSION OF PREDICTION ERROR INFORMATION In many applications the transmission of motion and segmentation information alone is insucient for the reconstruction of an image sequence with acceptable quality. Then the choice must be permitted of transmitting the prediction error corresponding to the motion estimator, especially for blocks containing artifacts in the reconstruction image. The optimization technique described in detail in the previous sections is easily extended so as to accommodate the choice of the transmission of prediction error. It will be assumed that the prediction error is coded using DCT transformation and Hu man entropy coding as is the case in JPEG. 11

Let again a segmentation S of the image plane consisting of Ns objects si : S = fsi ; i = 1; : : :; Nsg. For each candidate motion estimator Mj , j = 1; : : :; M , let V (j ) = fvi(j ); j = 1; : : :; M; i = 1; : : :; Nsg be the corresponding set of object motion vectors and E (j ) = fe(i j ) ; j = 1; : : :; M; i = 1; : : :; Nsg be the corresponding set of prediction errors. The general joint vector eld estimation and quadtree segmentation algorithm aims to minimize the distortion D of the reconstructed image sequence, under a constraint Rbudget on the rate for the transmission of the vector eld, the corresponding segmentation and prediction error information. This corresponds to the following constrained optimization problem : Ns X min D(vi; si; ei) ; (15) fvi j 2V j ;j =1;:::;M & si 2S & ei 2Eg i=1 subject to Ns X R(vi; si ; ei)  Rbudget ; ( )

( )

i=1

where Ns is the total number of objects in the image, D(vi; si ; ei) is the contribution of the decision (vi; si ; ei) to the distortion function and R(vi; si ; ei ) is the contribution of the same to the total rate or cost of the transmission of the motion vectors, the segmentation map and the prediction error information. As discussed in the previous section, the solution fvi?(); s?i(); e?i(); i = 1; : : :; Nsg of the problem of unconstrained minimization of

J () =

Ns X i=1

J (vi(); si(); ei()) =

Ns X i=1

is also a solution of (8) if

Rbudget =

D(vi(); si(); ei()) +  Ns X i=1

Ns X i=1

R(vi(); si(); ei()) ;

R(vi?(); s?i(); e?i()) :

(16) (17)

The problem therefore, reduces to ensuring that (10) has a solution for f(vi(); si(); ei()); i = 1; : : :; Ns)g and determining this solution. The rate-distortion optimization algorithm presented in the previous section is used again for the computation of the optimal segmentation and the corresponding motion estimator for each object. In the case of error transmission the distortion function used is

D(vi; si ) =

by bx X X

k=0 l=0



^ t (m + k + vx(i); n + l + vy(i)) ; imt (m + k; n + l) ; im

where ^ t (m + k; n + l) = imt;1(m + k + vx(i); n + l + vy(i)) + e^(m + k + vx(i); n + l + vy(i)) im where e^(m; n) is the decoded prediction error corresponding to pixel (m; n). 12

5 APPLICATION TO VERY LOW BIT RATE IMAGE SEQUENCE CODING

5.1 Computational Complexity of the Proposed Approach

The proposed algorithm consists of the initialization and the optimization stages. During the computationally involved initialization stage, all candidate algorithms for motion estimation are tested and their performance is stored in memory. Note that the distortion function is computed only in the rst iteration of the algorithm and thus the computational load of the remainder of the algorithm reduces to updating of the entropy of the vector eld and nding the minimum J (). Note also that following the rst frame, the search for  is con ned to narrower intervals and hence fewer iterations are needed for the completion of the optimization stage. Also, the choice of the segmentation map corresponding to the previous frame as an initial segmentation for the current frame, further reduces the computational load of the proposed algorithm. The execution time of the encoding phase of the algorithm in a R4400 INDIGO II Silicon Graphics workstation is approximately 1 minute for each frame. Most of this time (about 60%) is devoted to the initialization stage where the distortion functions are computed. The remaining 40% is used to complete the optimization procedure. As elaborated in the sequel the optimization algorithm was run for many values of Rbudget and target bitrates and was seen to converge very rapidly, never requiring more than fteen iterations in any of our experiments. This convergence to the desired bitrate for a videoconference scene (\Claire") and a non-videoconference scene (\Tunnel") is depicted in Figures 4 and 5, respectively. The computational complexity of the decoding phase of the proposed approach is very low and even a software decoder may be implemented in real time. This makes the proposed scheme an attractive candidate for use in asymmetrical coding applications such as multimedia communication, teleshopping and xed-location-to-mobile or broadcast video communication.

5.2 Experimental Results In order to evaluate the performance of the proposed approach for very low bitrate coding, the algorithm was applied to the typical QCIF sequences \Claire", \Miss America", \Salesman", \Foreman" and a QCIF version of the MPEG-4 test sequence \Tunnel". The frame rate of the sequences was 10 frames=sec. Objects were de ned using the segmentation procedure described in Section 3. The construction of the quadtree representation for the rst frame of the video sequence may start with the hypothesis 13

that the whole image may be represented by only one node (root) and proceed with tests deciding if further splitting is necessary, as described in Section 3. However, it was found experimentally that in practice it is preferable to start by testing smaller blocks (typically 32  32 pixels each) instead of the entire image, so as to expedite the identi cation of the optimal segmentation. Similar constraints were imposed on the size of the smallest blocks in order to maintain the segmentation overhead information within acceptable limits. The size of the smallest block in our experiments was chosen to be xmin  ymin = 4  4. As noted in Section 3, after the rst frame a good choice for the initial segmentation mask is the segmentation mask corresponding to the immediately preceding in time frame. As a rst experiment, the optimized motion estimation and quadtree segmentation algorithm was applied for the coding of speci c frames only of the \Miss America" and \Claire" sequences. More speci cally, the algorithm was applied between the zeroth and fth frames of \Miss America" and the zeroth and second frames of \Claire". The original zeroth and fth frames of \Miss America" and zeroth and second frames of \Claire" are shown in Figures 6a and 6b and 7a and 7b, respectively, while the reconstructed ones and the corresponding prediction errors are illustrated in Figures 6c, 6d and 7c, 7d, respectively. The resulting quadtree segmentation is also shown in Figure 8. Figure 9 shows the MSE (mean-square error) versus bitrate for the coding of the fth frame of \Miss America". This curve was obtained by running the proposed algorithm for various Rbudget and computing the MSE after the convergence. For very low bitrate video coding applications the proposed algorithm may also be applied to groups of frames (GOF), with the bit allocation assigned adaptively to each frame of the sequence in order to optimize the transmission bitstream. More speci cally, the total bitrate is given for the coding of the whole GOF and the rate is allocated to each frame according to the frame di erence between the current frame and the preceding in time frame. Thus, if Rtarget is the target bitrate for the coding of the whole GOF and

FDi =

N1 X N2 X k1 k2

jimi(k1; k2) ; imi;1(k1; k2)j

is the frame di erence between frames i and i ; 1, and N1 ; N2 are the image dimensions, the bitrate is allocated as follows FDi : i) = R R(budget target P10 i=1 FDi

In this way, the coding of the motion and segmentation information is optimized for the whole GOF. 14

For the coding of a group of frames, the rst version of the optimized motion estimation algorithm (RDHC) was applied for the coding of the rst ten frames of \Miss America" and \Claire" using motion compensation. Figures 10 and 11 illustrate the comparison of the proposed algorithm with the simple block matching algorithm with a block size of 8  8 and 16  16, used in the existing standards (MPEG, H.261), in terms of PSNR versus frame number, for the coding of the rst ten frames of the two above sequences, respectively. The simple block matching approach consists of absolute displaced frame di erence minimization, by searching exhaustively within a search area of ;15; : : :; 15 half-pixels in the previous in time frame, centered at the position of the examined block. In both coders, it was assumed that each frame was predicted using the reconstructed previous frame, and that the prediction error was not transmitted. The Rbudget was selected to be 24 Kbits and 12 Kbits for \Miss America" and 24 Kbits and 10 Kbits for \Claire", respectively, for the coding of the nine frames following the initial frame. As seen, the performance of the proposed algorithm is very high compared to the standard block matching technique, since both segmentation and motion estimation are optimized for a speci c bitrate. The second version of the proposed coder (RDHCE) which includes error transmission was also used for the coding of the above QCIF sequences. The technique described earlier, of adapting bit allocation to frame di erences is also used in this coder version ; however the transmission of the error allows ecient communication of much longer groups of frames. Three target bit rates were tested : 14:4 Kbits=sec, 28:8 Kbits=sec and 64 Kbits=sec. Figures 12, 13, 14, 15 and 16 illustrate the performance of the proposed scheme in terms of PSNR versus bitrate for the coding of the rst fty frames of \Miss America", \Claire", \Salesman" and \Foreman" and the rst twenty frames of \Tunnel". Note that the coder performance remains stable providing a high quality image without the intervention of a new intra-coded image before the end of the whole sequence. Figures 17, 18 and 19, 20, respectively, show original and reconstructed frames of the sequences \Salesman" and \Claire" coded at 64 Kbits=sec. Regarding the performance of optimization, Figures 21a and 21c show the resulting quadtree corresponding to the tenth frame of \Salesman" and the eighth frame of \Foreman". The complexity of the quadtree for the representation of both sequences is relatively high, since \Salesman" is a sequence with large rigid and exible motion while in the \Foreman" sequence both object motion and camera motion exist. Also , Figures 21b and 21d present the motion estimator index map corresponding to the segmentation maps of Figures 21a and 21c. In these gures each object is colored depending on the motion estimator choice. White color corresponds to no motion and predicted motion, dark gray 15

to ane 2D motion, light gray to block-matching and black corresponds to 3D motion. Also Table 1 shows an average percentage of the motion estimator choices for the coding of all the QCIF image sequences. In addition to videoconferencing schemes, the algorithm was also tested in the more complicated MPEG-4 test sequence \Tunnel". The results are comparable to the results obtained using videophone-related sequences. Figure 22 shows original and reconstructed frames of the sequence \Tunnel" coded at 64 Kbits=sec. The segmentation map and the motion estimator index map corresponding to this sequence is shown in Figures 23a, 23b. The performance in terms of the PSNR versus bitrate for the coding of the rst twenty frames of \Tunnel" is illustrated in Fig. 16.

6 CONCLUSIONS A rate-distortion framework was used to de ne a very low bit rate coding scheme based on quadtree segmentation and optimized selection of motion estimation. This technique achieved maximum reconstructed image quality under the constraint of a target bitrate for the coding of the vector and segmentation information. Joint optimization of quadtree object segmentation and motion estimation method for each leaf of the tree, i.e. each object, was achieved subject to this target bitrate restriction. For experimental evaluation, the proposed algorithm was combined with an appropriate rate control strategy to optimize the coding of the motion vectors corresponding to all the frames of a group of frames of an image sequence. Experimental results in application for the coding of typical videophone sequences have demonstrated the performance of the proposed very low bit-rate video coding scheme.

References [1] H. Li, A. Lundmark, and R. Forchheimer, \Image Sequence Coding at Very Low Bitrates - A Review," IEEE Trans. on Image Processing, vol. 3, pp. 589{609, Sep. 1994. [2] H. G. Musmann, P. Pirsch, and H. J. Grallert, \Advances in Picture Coding," Proc. IEEE, vol. 73, pp. 523{548, Apr. 1985. [3] M. I. Sezan and R. L. Lagendijk, Motion Analysis and Image Sequence Processing. Kluwer Academic Publishers, Boston, 1993. [4] H. G. Mussman, M. Hotter, and J. Ostermann, \Object-oriented analysis-synthesis coding of moving images," Signal Processing : Image Communication, vol. 1, pp. 117{138, Oct. 1989. [5] M. Hotter, \Optimization and Eciency of an Object-Oriented Analysis-Synthesis Coder," Signal Processing : Image Communication, vol. 4, pp. 181{194, Apr. 1994. 16

[6] N. Grammalidis, S. Malassiotis, D. Tzovaras, and M. G. Strintzis, \Stereo Image Sequence Coding Based on 3-D Motion Estimation and Compensation," Signal Processing : Image Communication, vol. 7, pp. 129{145, Jan. 1995. [7] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \3-D Motion/Disparity Segmentation for Object-Based Image Sequence Coding," Optical Engineering, special issue on Visual Communications and Image Processing, vol. 35, pp. 137{145, Jan. 1996. [8] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Object-Based Coding of Stereo Image Sequences using Joint 3-D Motion/Disparity Compensation," IEEE Trans. on Circuits and Systems for Video Technology, vol. 7, No. 2, pp. 312-328, Apr. 1997. [9] Y. Shoham and A. Gersho, \Ecient Bit Allocation for an Arbitrary Set of Quantizers," IEEE Trans. on Acoust., Speech, Signal Processing, vol. 36, pp. 1445{1453, Sep. 1988. [10] K. Ramchandran and M. Vetterli, \Best Wavelet Packet Bases in a Rate-Distortion Sense," IEEE Trans. on Image Processing, vol. 2, pp. 160{175, Apr. 1993. [11] G. J. Sullivan and R. Baker, \Ecient Quadtree Coding of Images and Video," IEEE Trans. on Image Processing, vol. 3, pp. 327{331, May 1994. [12] E. Reusens, \Joint Optimization of Representation Model and Frame Segmentation for Generic Video Compression," Signal Processing, vol. 46, pp. 105{117, Sep. 1995. [13] G. Tziritas, \Rate Distortion Theory for Image and Video Coding," in 27th Int'l Conf. on Digital Signal Processing, (Limassol, Cyprus), Jun. 1995. [14] D. Tzovaras and M. G. Strintzis, \Motion Estimation Using Rate Distortion Theory for Very Low Bit Rate Image Sequence Coding," in Proc. Int'l Conf. Telecommunications '96, (Istanbul, Turkey), Apr. 1996. [15] D. Tzovaras and M. G. Strintzis, \Motion and Disparity Estimation Using Rate Distortion Theory for Very Low Bit Rate and Multiview Image Sequence Coding," in VCIP '97, (San Jose, California), Feb. 1997. [16] G. Wolberg, Digital Image Warping. IEEE Computer Society Press, Los Alamitos, California, 1988. [17] G. Adiv, \Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 7, pp. 384{401, Jul. 1985. [18] S. S. Sinha and B. G. Schunck, \A Two-Stage Algorithm for Discontinuity-Preserving Surface Reconstruction," IEEE Trans. on PAMI, vol. 14, Jan. 1992. [19] H. Everett, \Generalized Langrange Multiplier Method for Solving Problems of Optimum Allocation of Resources," Operation Res., vol. 11, pp. 399{417, 1963. [20] W. K. Press, B. P. Flannery, S. A. Tenkolsky, and W. T. Vetterling, \Numerical Recipes in C : The Art of Scienti c Computing," tech. rep., Cambridge, U.K., Cambridge Univ. Press, 1988. [21] J. Konrad and E. Dubois, \Bayesian estimation of motion vector elds," IEEE Trans. Pattern Anal. and Mach. Intell., vol. 14, pp. 910{927, September 1992. [22] S. Malassiotis and M. G. Strintzis, \Joint Motion / Disparity MAP Estimation for Stereo Image Sequences," IEE Proceedings: Vision, Image & Signal Processing, vol. 143, pp. 101{108, Apr. 1996. 17

List of Figures 1 2 3 4 5 6

7

8 9 10

11

12

13

The rst version (RDHC) of the proposed coding scheme (no option of error transmission). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : The second version (RDHCE) of the proposed coding scheme (error transmission is an option). : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Rate distortion framework for the selection of the optimal motion estimator. : : : : : : Convergence of the algorithm for the coding of frame 20 of \Claire" at R(20) budget = 0.064 bits/pixel. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : Convergence of the algorithm for the coding of frame 14 of Tunnel at R(14) budget = 0.24 bits/pixel. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : (a) Original frame 0 of \Miss America". (b) Original frame 5 of \Miss America". (c) Reconstructed frame 5 of \Miss America" from frame 0 using the proposed algorithm at 41:31 dB PSNR. (d) The corresponding prediction error image. : : : : : : : : : : : (a) Original frame 0 of \Claire". (b) Original frame 2 of \Claire". (c) Reconstructed frame 2 of \Claire" from frame 0 using the proposed algorithm at 40:2 dB PSNR. (d) The corresponding prediction error image. : : : : : : : : : : : : : : : : : : : : : : : : Quadtree segmentation corresponding to the frame 5 of \Miss America" when coded with the frame 0 as reference using the proposed algorithm. : : : : : : : : : : : : : : MSE versus bitrate (in bits=pixel) for the coding of the fth frame of \Miss America". Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the block-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence \Miss America" at 12 Kbits=sec and 24 Kbits=sec. : : : : : : : : : : Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the block-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence \Claire" at 10 Kbits=sec and 24 Kbits=sec. : : : : : : : : : : : : : : Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Miss America" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec : : : : Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Claire" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : : 18

20 20 20 21 21

22

23 24 24

25

25

26

26

14 Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Salesman" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : 15 Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Foreman" at 64 Kbits=sec 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : 16 Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 20 frames of the image sequence \Tunnel" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. : : : : : : : 17 Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman. : : : : : : : 18 Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman coded at 64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire. : : : : : : : : : 20 Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire coded at 64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21 Segmentation and motion estimator index maps of (a), (b) the tenth frame of \Salesman" and (c), (d) the eighth frame of \Foreman". White color corresponds to no motion and predicted motion, dark gray to ane 2D motion, light gray to block-matching and black corresponds to 3D motion. : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 Original frames 5 (a), 10 (b) and 15 (c) of \Tunnel". Reconstructed frames 5 (d), 10 (e) and 15 (f) of \Tunnel" coded at 64 Kbits=sec. : : : : : : : : : : : : : : : : : : : : 23 (a) Segmentation map of the fth frame of \Tunnel" interleaved with the image. (b) The corresponding motion estimator index map. White color corresponds to no motion and predicted motion, dark gray to ane 2D motion, light gray to block-matching and black corresponds to 3D motion. : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

19

27

27

28 29 29 30 30

31 32

32

List of Tables 1

Average percentage of selection of the candidate motion estimation methods used for the coding of \Miss America", \Claire", \Salesman", \Foreman" and \Tunnel". : : : : 28

20

...

GOF 1

...

GOF 2

GOF N

Figure 1: The rst version (RDHC) of the proposed coding scheme (no option of error transmission).

...

Predicted frames

I−frame

Figure 2: The second version (RDHCE) of the proposed coding scheme (error transmission is an option). Block Matching Motion Estimation

Affine 2D Motion Estimation

Rate Distortion Optimization Framework

3D Motion Estimation

Previous Motion

Decision

No Motion

Figure 3: Rate distortion framework for the selection of the optimal motion estimator. 21

0:1

Claire, 28.8 Kbits/sec

0:08

B I T R 0:06 A T E [bpp] 0:04

0:02

1

2

3

4

5

6

7

Iteration

8

9

10

11

12

Figure 4: Convergence of the algorithm for the coding of frame 20 of \Claire" at R(20) budget = 0.064 bits/pixel. 1

Tunnel, 64 Kbits/sec

0:8 B I T 0:6 R A T E 0:4 [bpp]

0:2 0

1

2

3

4

5

6

7

8

Iteration

9

10 11 12 13 14 15

Figure 5: Convergence of the algorithm for the coding of frame 14 of Tunnel at R(14) budget = 0.24 bits/pixel.

22

(a)

(b)

(e)

(f)

Figure 6: (a) Original frame 0 of \Miss America". (b) Original frame 5 of \Miss America". (c) Reconstructed frame 5 of \Miss America" from frame 0 using the proposed algorithm at 41:31 dB PSNR. (d) The corresponding prediction error image.

23

(a)

(b)

(e)

(f)

Figure 7: (a) Original frame 0 of \Claire". (b) Original frame 2 of \Claire". (c) Reconstructed frame 2 of \Claire" from frame 0 using the proposed algorithm at 40:2 dB PSNR. (d) The corresponding prediction error image.

24

Figure 8: Quadtree segmentation corresponding to the frame 5 of \Miss America" when coded with the frame 0 as reference using the proposed algorithm. 6:5 6

\Miss America"

3

3

5:5 MSE

3

5 4:5 4

3

3:5 3

0:002

0:004

3 0:006

3 0:008

Bitrate

3 0:01

0:012

0:014

Figure 9: MSE versus bitrate (in bits=pixel) for the coding of the fth frame of \Miss America".

25

RDHC-24 RDHC-12 + BM-8x8 ? BM-16x16

43 +

?

?

41

+

P S N 39 R [dB]

?

+

+?

+

?

37

+

+

?

?

6

7

+

?

+

?

35 1

2

3

4

5

Frame Number

8

9

Figure 10: Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the block-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence \Miss America" at 12 Kbits=sec and 24 Kbits=sec. 46 44 42

RDHC-24 RDHC-10 + BM-8x8 ? BM-16x16

+?

P 40 S N R 38 [dB] 36

?

+

?

+

?

+

34

?

+

?

+

+?

6

7

?

?

+

+

8

9

32 1

2

3

4

5

Frame Number

Figure 11: Comparison of the proposed rate-distortion optimized hybrid coder (RDHC) with the block-matching method (BM) in terms of PSNR vs frame number for the coding of the image sequence \Claire" at 10 Kbits=sec and 24 Kbits=sec.

26

45 43

64 Kbits/sec 28.8 Kbits/sec ? 14.4 Kbits/sec

? ?

P 41 S N R [dB] 39

?

?

?

? ? ? ?? ? ? ? ? ?

???

? ?? ? ?? ? ?

??? ? ? ?

? ? ? ? ?? ? ? ? ? ? ?? ? ?? ? ?

37 35

1

6

11

16

21

26

31

Frame Number

36

41

46

51

Figure 12: Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Miss America" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec 46 44 42

RDHC-24 RDHC-10 + BM-8x8 ? BM-16x16

+?

P 40 S N R 38 [dB] 36

?

+

?

+

?

+

34

?

+

?

+

+?

6

7

?

?

+

+

8

9

32 1

2

3

4

5

Frame Number

Figure 13: Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Claire" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec.

27

45

? ?

43

?

41 P S N 39 R [dB] 37

64 Kbits/sec 28.8 Kbits/sec ? 14.4 Kbits/sec

? ?

?

? ???

35

???

? ? ?? ? ? ? ?? ? ? ? ??? ??? ? ? ?? ? ? ? ? ?? ? ?? ? ? ?? ??

33 31

1

6

11

16

21

26

31

Frame Number

36

41

46

51

Figure 14: Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Salesman" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. 39 37 ? 35

64 Kbits/sec 28.8 Kbits/sec ? 14.4 Kbits/sec

? ?

P S N 33 R [dB] 31

?

?

?

??????

29 27

1

6

11

?

? ????? ?

16

?? ?? ? ? ?

21

26

? ? ? ??? ??? ????? ?? ? 31

Frame Number

36

41

?????? 46

51

Figure 15: Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 50 frames of the image sequence \Foreman" at 64 Kbits=sec 28:8 Kbits=sec and 14:4 Kbits=sec.

28

32

P S N R [dB]

30

64 Kbits/sec 28.8 Kbits/sec ? 14.4 Kbits/sec

? ?

? ? ? ? ?

28

26

? ? ?

1

6

? ? ? ?

?

11

Frame Number

? ?

16

? ? ?

21

Figure 16: Performance of the proposed rate-distortion optimized hybrid coder with transmission of the prediction error (RDHCE) for the coding of the rst 20 frames of the image sequence \Tunnel" at 64 Kbits=sec, 28:8 Kbits=sec and 14:4 Kbits=sec. Method

Percentage of Selection (%) No motion 37.11 Predicted motion 12.86 Block-Matching 30.38 Ane 2D 13.1 3D motion 6.55 Table 1: Average percentage of selection of the candidate motion estimation methods used for the coding of \Miss America", \Claire", \Salesman", \Foreman" and \Tunnel".

29

(a)

(b)

(c)

(d)

(e)

(f)

Figure 17: Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 18: Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Salesman coded at 64 Kbits=sec. 30

(a)

(b)

(c)

(d)

(e)

(f)

Figure 19: Original Frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 20: Reconstructed frames, 5 (a), 10 (b), 15 (c), 20 (d), 25 (e), 30 (f) of Claire coded at 64 Kbits=sec. 31

(a)

(b)

(c)

(d)

Figure 21: Segmentation and motion estimator index maps of (a), (b) the tenth frame of \Salesman" and (c), (d) the eighth frame of \Foreman". White color corresponds to no motion and predicted motion, dark gray to ane 2D motion, light gray to block-matching and black corresponds to 3D motion.

32

(a)

(b)

(c)

(d)

(e)

(f)

Figure 22: Original frames 5 (a), 10 (b) and 15 (c) of \Tunnel". Reconstructed frames 5 (d), 10 (e) and 15 (f) of \Tunnel" coded at 64 Kbits=sec.

(a)

(b)

Figure 23: (a) Segmentation map of the fth frame of \Tunnel" interleaved with the image. (b) The corresponding motion estimator index map. White color corresponds to no motion and predicted motion, dark gray to ane 2D motion, light gray to block-matching and black corresponds to 3D motion. 33

Suggest Documents