Motion and Disparity Field Estimation using Rate-Distortion Optimization

1 downloads 0 Views 265KB Size Report
coding of the fifth frame of \Miss America". ... areas marked black). .... sequences \Miss America" and \Claire" and the stereoscopic image sequences \Sergio".
Motion and Disparity Field Estimation using Rate-Distortion Optimization  Dimitrios Tzovaras and Michael G. Strintzis, Senior Member, IEEE Electrical and Computer Engineering Department Information Processing Laboratory Aristotle University of Thessaloniki Thessaloniki 54006, Greece phone: (+30-31) 996-359, fax: (+30-31) 996-398 e-mail : [email protected]

Abstract

A rate-distortion framework is used to de ne a displacement vector- eld estimation technique for use in video coding. This technique achieves maximum reconstructed image quality under the constraint of a target bitrate for the coding of the vector sequence. The technique may be adapted so as to limit its smoothing e ect to homogeneous areas and avoid highly textured areas and edges. Use of this technique is evaluated for two application areas in which the need for high compression of displacement vector elds is particularly acute. The rst is motion- eld coding for very low bit rate image sequence transmission as in videophone applications. The second application area is coding for the transmission of dense disparity elds. This is needed for the generation at the receiver of intermediate viewpoints through spatial interpolation. It is also needed in a number of other applications requiring accurate depth knowledge, including 3D medical data transmission and transmission of scenes to be postprocessed using depth-keyed segmentation. Experimental results illustrating the performance of the proposed technique in these application areas are presented and evaluated.

Subject terms: Very low bit-rate coding; rate-distortion theory; vector eld coding; depth map coding.

This work was supported by the EU CEC Project PANORAMA (ACTS project 092) and the COST 211ter project. 

i

List of Figures Average MSE versus average bitrate (in bits=vector) for the coding of the rst 50 frames of \Claire" using motion compensation. : : : : : : : : : : 2 Average MSE versus average bitrate (in bits=vector) for the coding of the rst 25 frames of \Tunnel" using disparity compensation. : : : : : : : : : 3 MSE versus bitrate (in bits=vector) for the block-based coding of the fth frame of \Miss America". : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 MSE versus bitrate (in bits=vector) for the block-based coding of the third frame of \Claire". : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 MSE versus bitrate (in bits=vector) for the block-based coding of the second frame of \Sergio". : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 MSE versus bitrate (in bits=vector) for the coding of the dense disparity eld corresponding to the second frame of \Tunnel". : : : : : : : : : : : 7 Adaptive versus non-adaptive versions of the proposed RDOVFE algorithm in terms of MSE versus bitrate (in bits=vector) for the block-based coding of the fth frame of \Miss America". : : : : : : : : : : : : : : : : 8 (a) Original frame 1 of \Miss America". (b) Original frame 5 of \Miss America". (c) Original motion vector eld estimated with the block matching algorithm. (d) Motion vector eld estimated with the rate-distortion algorithm at 1:58bits=vector. (e) The output of the edge extractor (homogeneous areas marked white, edges are marked grey and highly textured areas marked black). (f) Computed motion vector using the adaptive smoothing algorithm. : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 (a) Original frame 1 of \Claire". (b) Original frame 3 of \Claire". (c) Original motion vector eld estimated with the block matching algorithm. (d) Reconstructed frame 3 of \Claire" using the computed vector eld at 1:1 bits=vector. (e) The output of the edge extractor. (f) Computed motion vector eld using the adaptive smoothing algorithm. : : : : : : : 10 (a) Original left channel image \Sergio" (frame 2). (b) Original right channel image \Sergio" (frame 2). (c) Block-based estimate of disparity. (d) The output of the edge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generated using the computed depth. : : : : : : 1

ii

15 15 16 16 17 17 18

19

20

21

11 (a) Original left channel image \Tunnel" (frame 2). (b) Original right channel image \Tunnel" (frame 2). (c) Block-based estimate of disparity. (d) The output of the edge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generated using the computed depth. : : : : : : 22

iii

I INTRODUCTION The transmission of full motion video through limited capacity channels is critically dependent on the ability of the compression schemes to achieve target bit rates while still maintaining acceptable visual quality [1]. In order to achieve this, motion estimation and motion compensated prediction are frequently used, so as to reduce temporal redundancies in image sequences [2]. Similarly in the coding of stereo and multiview images, prediction may be based on disparity compensation [3] or the best of motion and disparity compensation [4]. While much attention has been devoted to the coding of the intraframe and prediction error images, the displacement vector elds are usually coded losslessly using DPCM/Hu man coding resulting in limited compression. The reason for this is that digital video coding systems for many applications have at their disposal rates ranging from 1 Mbit=sec to 25 Mbits=sec. At such rates, only a minor part of the global rate is devoted to the transmission of the vector elds, hence the bitrate overhead produced by lossless encoding of the vector elds is negligible. In many emerging application areas however, lossy compression of the vector elds is often highly desirable, and sometimes unavoidable. For example, mobile videophone or multimedia transmission channels are often limited to capacities of 4.8 - 64 kbps. In such cases, it is clearly desirable to reduce as much as possible the bitrate needed to transmit the motion vector elds, provided that this reduction does not produce intolerable distortion in the reconstructed image. It is also desirable to allocate the bitrate devoted to the coding of motion elds adaptively, depending on the complexity of the sequence and also on the overall bitrate availability when the latter varies with time. High compression of vector elds is also needed in many emerging applications requiring the transmission of disparity elds for stereo image communication. Sparse disparity elds are used to predict one image of a stereoscopic pair from another, within a coding scheme using disparity compensated prediction [3, 4] or joint motion and disparity compensated prediction [5]. The need for compression is particularly acute when dense depth or disparity elds must be transmitted in order to permit multiview image generation at the receiver, through spatial interpolation [4]. With a multiview display, this allows the observer to watch the scene from varying optical angles. In other applications, the 1

generation of intermediate images is needed even with simple monoscopic displays at the receiver. For example, simulated eye-contact is known to enhance the \telepresence" which is desirable in advanced videoconferencing schemes [1]. Further, dense disparity estimation and transmission is necessary in many other applications, permitting for example distance-to-the-camera keyed segmentation for background/foreground mixing, and quality control with depth models. Lossy or lossless techniques for the transmission of such dense disparity elds were investigated in [6, 7]. Lossless transmission was estimated to require bitrates as high as 1Mbps for 720  576 pixel sized image sequences [6]. Such requirements exceed the capacity of many practical communication channels. Reduction of the bit rate needed for the coding of either motion or disparity vector elds may be achieved by optimal and adaptive bit allocation to already determined vector elds [11, 12, 13] or by appropriate smoothing [8]. Such smoothing is often necessary quite independently of data compression purposes. For example, in disparity estimation of object-based coding of stereo sequences [3, 4, 5, 9, 10] improved results are obtained by a disparity estimation procedure based on dynamic programming with a smoothing constraint. In general, however, care must be taken that smoothing of vector elds for compression purposes, does not impair the eciency of the resulting displacement compensated image prediction. Thus, a compromise must be found between minimizing the entropy of the displacement vectors and minimizing the displaced frame di erence between temporally or spatially adjacent frames. An elegant framework for the de nition of such a strategy is provided by the classical rate-distortion constrained minimization procedure. This has been recently used in many coding applications including bit allocation for vector quantization [14], wavelet packet image coding [15] and quadtree still image coding [16]. The present paper extending the preliminary results reported in [10], investigates the use of this methodology for block-based motion/disparity eld estimation under the constraint of a target bitrate for the coding of the vector information. The entropy of the displacement vectors is used as a measure of the bit rate needed for their lossless transmission. An adaptive variant of the vector eld estimation is speci cally applied, which limits the e ects of the smoothing algorithm to the homogeneous areas only, avoiding highly textured areas and edges. 2

Experimental results are given for the coding of the typical videophone QCIF image sequences \Miss America" and \Claire" and the stereoscopic image sequences \Sergio" and \Tunnel". The paper is organized as follows. In Section II the general vector eld estimation problem is de ned in terms of a rate-distortion minimization approach. Section III describes the application of the algorithm for the purposes of motion, disparity and depth eld estimation. Section IV describes an adaptive version of the general estimation algorithm which avoids smoothing of the vector elds in areas such as edges or object boundaries. Experimental results given in Section V illustrate the performance of the proposed methods. Finally, conclusions are drawn in Section VI.

II THE GENERAL VECTOR FIELD ESTIMATION METHOD An optimal in the rate-distortion sense vector eld estimation technique is developed in this section. The vector eld estimator is going to be used for motion or disparity estimation between two subsequent in time frames or the left and right camera images, respectively for the two speci c applications examined in this paper. Let vi = (vxi ; vyi ) 2 V be the vector corresponding to the block i of the image, where V is the set of all possible displacement vectors determined by the search area S of the block matching algorithm. The general vector eld estimation algorithm aims to minimize the distortion D of the reconstructed image sequence, under a constraint Rbudget on the rate for the transmission of the vector eld information. This corresponds to the following constrained optimization problem : ( )

( )

min v 2V i

subject to

Nb

X

i=1

Nb

X

i=1

Di (vi) ;

Ri(vi)  Rbudget ;

(1) (2)

where Nb is the total number of blocks in the image, Di (vi) is the contribution of vi to the distortion function and Ri(vi) is the contribution of the vector vi to the total rate or cost of the transmission of the motion vectors. 3

The methodology in [17] permits the transformation of the above into an unconstrained optimization problem. In fact, as shown in [17] (the proof is also contained in [14]), the solution fvi?(); i = 1; : : : ; Nbg of the problem of unconstrained minimization of Nb Nb Nb J () = Ji(vi()) = Di (vi()) +  Ri(vi()) ; (3) X

X

X

i=1

i=1

i=1

is also a solution of (1) if

Rbudget =

Nb

X

i=1

Ri(vi?()) :

(4)

The problem therefore, reduces to ensuring that (4) has a solution for vi() and determining this solution. This was investigated from a general viewpoint in [14], where it was shown that Ri(v()) and Di(v()) are monotonic functions of the Langrange multiplier , which may be interpreted as a quality index, with values ranging from 0 (highest rate, lowest distortion) to 1 (lowest rate, highest distortion). Further investigation in [15] proved that the solution of (4) may be obtained using any fast convex algorithm such as the bisection algorithm [18]. One such algorithm, which gave very good results in both [15] and [16], is also adopted in the present paper. This algorithm proceeds as follows. First two values l < u of  are found so that Nb

X

i=1

Ri(vi(u ))  Rbudget 

Nb

X

i=1

Ri(vi(l)) :

(5)

For the coding of a sequence of frames, these values are chosen to be l = 0, u = 1 for the initial frame and l = 0:8, u = 1:2 for subsequent frames, where  is the solution of (4) for the previous frame. The bracketing interval is then successively decreased in size by the following procedure :  Step 1 For each block i, i = 1; : : :; Nb, compute Di(vi(l)) and Di(vi(u )) and the corresponding Ri(vi(l)) and Ri(vi(u )).

 Step 2 Set

Nb [D (v ( )) ; D (v ( ))] i i l i i u i Nb [R (v ( )) ; R (( ))] +  i i l i u i

P =1 P =1

new = where  is a vanishingly small positive number.

;

(6)

 Step 3 For each i, determine iteratively the displacement vectors vi(new ) so as to minimize for i = 1; 2; : : : ; Nb

Di (vi(new )) + new Ri(vi(new )) : 4

(7)

Then compute the corresponding fRi(vi(new ))gi and fDi (vi(new ))gi.

 Step 4 If Else if

Nb R (v ( )) = PNb R (v ( )), i i new i i u i=1 i=1

P

Nb R (v ( )) > R i i new budget , l i=1

P

Else u

then stop,  = u .

new . Go to step 2.

new . Goto step 2.

Obviously therefore, if the above algorithm converges, a solution of (4) will have been obtained and the equivalent problem of the constrained minimization of (1) will have been solved. Note that the distortion corresponding to each motion vector in a speci c search area is computed only once, at the rst iteration of the algorithm. Thus the computational load of the algorithm consists of updating the entropy of the vector eld and nding the minimum J ().

III APPLICATION OF THE VECTOR FIELD ESTIMATION ALGORITHM TO MOTION, DISPARITY AND DEPTH ESTIMATION The speci c way the vector eld a ects the quality of the reconstructed image will determine the distortion index Di (vi()). A number of such distortion measures have been proposed in the literature. In the case of block-based motion estimation, the simplest and most commonly used is the temporally displaced frame di erence

Di(vi) =

bx by

XX

k=0 l=0

( )

imt(m + k; n + l) ; imt; (m + k + vxi ; n + l + vyi ) ; 1

( )

(8)

where (m; n) are the upper left hand corner coordinates of block i, imt(), and imt; () is the image at time instant t and t ; 1, respectively and bx; by are the dimensions of the block. The corresponding distortion index for block-based disparity estimation is the spatially displaced frame di erence 1

Di (vi) =

bx by

X X

k=0 l=0

( )

iml(m + k; n + l) ; imr (m + k + vxi ; n + l + vyi ) ; ( )

5

(9)

where again (m; n) are the upper left hand corner coordinates of block i, iml(t), and imr(t) are the left and right images respectively and bx; by are the dimensions of the block. If depth is evaluated from disparity using methods such as given in [5] or from motion using the method in [23], a pixel-wise distortion index may be used such as

Di(vi) =

bx

X

by

X

k=;bx l=;by



( )

iml(m + k; n + l) ; imr(m + k + vxi ; n + l + vyi ) ; ( )

(10)

where now (m; n) are the coordinates of the working pixel, bx; by are the dimensions of a rectangular window centered at (m; n) and vxi ; vyi are the components of the disparity vector computed from depth information. Alternately, if the depth eld is modeled by a wireframe [5], only the node information is transmitted, hence an appropriate distortion measure would be ( )

Di (vi) =

X

( )

(^z(m; n) ; z(m; n)) ; 2

m;n) 2 block i

(

(11)

where z^(m; n) is the wireframe modeled depth and z(m; n) is the depth computed from the disparity eld vi. Also the transmission cost R(vi()) will depend on the speci c method used for the coding of the vector elds. Assume rst that entropy coding (e.g. Hu man or arithmetic coding) is used, with an adaptive probability model. In this case, the transmission cost or rate is the entropy m1 m2 R=; pxy log (pxy ) ; (12) X

X

x=;m1 y=;m2

2

where m ; m are the maximum allowed x- and y- components respectively, of the displacement vectors, and N 1 (13) pxy = N dxy (vk ) ; 1

2

b X

where

b k=1

(

k x) and (vyk = y) : dxy (vk ) = 10 if (vx =otherwise The above rate R may be rewritten in the form (3) ( )

R=

N

b X

i=0

Ri(vi) ; 6

( )

(14)

(15)

with the de nition for i > 1 Ri(vi) = ; and

m

1 X

m

2 X

pxyi log2(p(xyi)) +

x=;m1 y=;m2

R (v ) = ; 1

1

with

m

m

2 X

x=;m1 y=;m2

pxyi = 1i ( )

i

X

k=1

2 X

x=;m1 y=;m2

m

1 X

m

1 X

( )

pxyi; log (pxyi; ) ; (

1)

(

2

pxy log (pxy ) ; (1)

2

1)

(16) (17)

(1)

dxy (vk) :

(18)

The algorithm in Section II may be applied using (16) for the determination of the contribution of vi to the rate increase. Alternately, since the second term of the right part of (16) is independent of vi, step 3 of the algorithm may be equivalently carried out by minimizing for each i the sum

D?(vi(new )) + new R?(vi(new )) ;

(19)

where D?(vi) is the total distortion of the rst i blocks of the image and is given by

D? (vi) =

i

X

k=1

Dk (vk(new )) ;

(20)

and R?(vi) is equal to the rst term in (16), thus to the total entropy of the rst i blocks of the image : m1 m2 R?(vi) = ; pxyi log (pxyi ) ; (21) X

X

( )

2

x=;m1 y=;m2

( )

where pxyi is given by (18). Note also that (18) is equivalent to the following ecient formula for the incremental computation of pxy (vi()) : pxyi = i +i 1 pxyi + dxy (vi ()) : (22) Alternately, DPCM coding of the vector eld may be assumed, followed by appropriate entropy coding. In this case the coding rate is expressed by ( )

( +1)

R=;

( )

m

2 1 X

m

2 2 X

x=;2m1 y=;2m2

+1

qxy log (qxy ) ;

(23)

2

where qxy is the probability that the di erence between the vector minimizing the index J () and the vector corresponding to the previous block, dv = vi ; vi; satis es dvx = x 1

7

and dvy = y. The probability qxy is computed using equation (18) or (22) with vk replaced by dvk . A more computationally ecient approach, which does not involve incremental computation of the probability density of the vector eld or the rst order vector eld differences, is to assume a model for this probability density function. Speci cally, the assumption of Gauss-Markov Random Field to describe both motion [19] and disparity [20] vector di erences could be used so as to accelerate the rate-distortion minimization procedure.

IV ADAPTIVE VECTOR FIELD ESTIMATION In the general procedure outlined in the preceding section, the mechanism for the minimization of the rate constraint index, is based on limiting the entropy, which as expected, and as shown by the experimental results, is attained by smoothing of the displacement vector elds. Uniform smoothing of motion or disparity vector elds may have very undesirable e ects, and in fact, may result to severe loss of motion or depth information, especially in the neighborhood of edges, and thus to an increased reconstruction error. An adaptive application of the proposed algorithm is therefore necessary so as to limit the smoothing only to the homogeneous areas and avoid highly textured areas or edges. In this way, the discontinuities of both motion and disparity [21], are preserved and the reliable vector elds corresponding in highly textured areas are excluded from smoothing. Furthermore, the use of edge information reduces the well known \corona e ect" which is one of the drawbacks of block-based matching methods. Adaptive vector eld estimation may be accomplished by the minimization of the following index :

J () =

Nb

X

i=1

Di (vi()) + 

Nb

X

i=1

E (vi())Ri (vi()) ;

(24)

where E (vi()) is a bitmap de ned as : (

i is reliable : E (vi()) = 01 if votherwise (25) The motion vector vi corresponding to block i may be considered reliable whenever the block contains edges or highly textured areas. For the detection of edges and textured

8

areas a variant of the technique in [22] was used, based on the observation that highly textured areas exhibit a high local intensity variance in all directions while on edges the intensity variance is higher across the direction of the edge. Accordingly, for each pixel in the image, two parameters were computed. The rst parameter, ^ , is the unbiased estimate of the local variance. If the variance exceeds a threshold c , the pixel is considered to be a part of either an edge or a highly textured area. The second parameter was used to di erentiate highly textured areas from edges. For this purpose the local variances in eight directions (multiples of =4), were computed and the second parameter p was de ned by ^i : i  p = max (26) mini ^i Since p is close to 1 in areas with uniform texture, and much higher in edge areas, pixels were assigned to edges whenever p exceeded a threshold pc . Finally, blocks containing edges or consisting in large part (over 50%) of highly textured area were assumed to produce reliable vi. However, with the adaptive version of the algorithm, there is an increased probability that no solution of the constrained optimization problem (3,4) will exist. In this case it will be impossible to determine an initial l satisfying (5) and the algorithm (5)-(7) will be inapplicable. If this happens, the restriction on smoothing may be dropped for the highly textured areas and retained only for blocks containing edge information. If this still fails to produce a rate equal to Rbudget according to (4), all restrictions on smoothing are lifted and the adaptive version of the algorithm reverts to the non-adaptive one. Both the non-adaptive (3) and the adaptive (24) versions of the vector eld estimation techniques were experimentally applied for the estimation of motion, disparity and depth elds, using appropriate distortion measures to evaluate the quality of the reconstructed images. 2

2

2

2

V EXPERIMENTAL RESULTS In order to evaluate the performance of the proposed approach for the low-bitrate coding of videophone and videoconferencing image sequences, the proposed Rate Distortion Optimization Vector Field Estimation (RDOVFE) algorithm was applied rst to the 9

typical QCIF sequences \Claire" and \Miss America". Further, the disparity estimation algorithm was tested in the stereo image sequences \Sergio" of dimension 256  256 and \Tunnel" of dimension 360  288. First, the RDOVFE algorithm was sequentially applied for the coding of the rst 50 frames of \Claire" using motion compensation. On a R4400 Silicon Graphics machine it required an execution time of about 20 seconds compared to 4 seconds for the exhaustive block-matching search algorithm. The average bitrate versus the average mean square error (MSE) is shown in Figure 1. As seen, the proposed technique may achieve very considerable bitrate savings with very modest corresponding increases in the mean-square error. For example, Figure 1 indicates that coding of the rst 50 frames of the \Claire" sequence may be achieved with 1:1 bits=vector (0:01718 bits=pixel) and an average mean square error of 2:64. This contrasts with 3:23 bits=vector (0:05046 bits=pixel) required for lossless coding with a mean square error of 2:53. That is, a bitrate decrease by a factor of 2:93 is achieved at the cost of mean-square error increase of the order of 4% with no visible image deterioration. Entropy coding, without DPCM, was used to compress the vector elds. Next, full search block matching motion estimation was performed between the rst and fth frame of \Miss America" (see Figures 8a and 8b) and the rst and the third frame of \Claire" (see Figures 9a and 9b), in order to evaluate the quality of the computed vector elds. A block size of 8  8 pixels and a search area of ;8; : : : ; 8 pixels for both x; and y; components of the motion vector eld was used. The rate distortion minimization algorithm was run for various values of Rbudget and was seen to converge in fewer than 6 iterations. Figure 3 and 4 show the MSE versus the bitrate (in bits=vector) needed for the coding of the motion vectors corresponding to the fth frame of \Miss America" and the third frame of \Claire", respectively. The value of  corresponding to each of the operating points of the algorithm is also shown. The original vector eld computed using an exhaustive search block matching algorithm is shown in Figures 8c and 9c for the two sequences, respectively. The vector eld coded at 1:58 bits=vector for \Miss America" computed using the proposed algorithm, 1

1 The sequence \Tunnel" which is a MPEG-4 stereo test sequence, was prepared by the Centre  D'Etudes de Teledi usion et Telecommunications (CCETT) for use in the DISTIMA RACE and the PANORAMA ACTS projects.

10

is shown in Figures 8d. The motion compensated estimate of the third frame of \Claire" corresponding to motion eld coded at 1:1bits=vector is shown in Fig. 9d. Edge extraction was subsequently performed using the technique proposed in Section IV with c = 40 and pc = 10 with results shown in Figures 8e and 9e. In these gures homogeneous areas are shown in white, edges in grey and highly textured areas shown in black. The adaptive version of the algorithm described in section IV was then applied to the two test sequences and produced the vector elds shown in Figures 8f and 9f. In this case, the adaptive smoothing technique further reduced the bitrate (at 1.202 bits/vector) by eliminating the scattered motion vectors corresponding to the background. The performances of the adaptive and non-adaptive versions of the RDOVFE algorithm were also compared. Figure 7 shows the rate-distortion curve for the adaptive and non-adaptive versions in application for the coding of \Miss America". In curve RDOVFE-(T+E) of Figure 7 the no-smoothing constraint of the adaptive version of the proposed algorithm is used in both highly textured areas and blocks containing sucient edge information. In curve RDOVFE-T of Figure 7 the no-smoothing constraint is relaxed only in the highly textured areas in order to obtain lower bit rates. If again, the Rbudget constraint is not satis ed, the no-smoothing constraint is relaxed for all blocks in the image, and the adaptive version of algorithm reduces to the non-adaptive one (curve RDOVFE in Figure 7). The displacement vector eld estimation algorithm was then applied for the coding of the rst 25 frames of \Tunnel" using pixel-based disparity compensation. Figure 2 shows the average MSE versus the average bitrate for the coding of depth information of the rst 25 frames of the sequence \Tunnel'. Accurate depth information (with negligible increase in MSE) can be extracted at 2:61 bits=pixel. The original left and right second frames of \Sergio" and \Tunnel" are shown in Figures 10a and 10b and 11a and 11b. Block-based disparity estimation with a block size of 8  8 was performed rst. The computed x-component of the block-based estimated disparity eld is shown in Fig. 10c and 11c. Figure 5 shows the MSE versus the bitrate (in bits=vector) needed for the coding of the block-based disparity eld corresponding to the second frame of \Sergio". The dense disparity eld was estimated using a window of 9  9 pixels and a search area of ;16; : : : ; 16 and ;1; : : :; 1 pixels for the x- and y- components of the disparity 2

11

eld, respectively. Figure 6 shows the MSE versus the bitrate needed for the coding of the depth map corresponding to the second frame of \Tunnel". The output of the edge extraction procedure applied to the right channel second frames of \Sergio" and \Tunnel" is shown in Figure 10d and 11d respectively. The computed depth maps using the adaptive algorithm are then illustrated in Figures 10e and 11e. The depth map is shown quantized into 256 levels and has the same resolution with the original image (since it is computed by a dense disparity eld). Brighter areas represent those closer to the cameras. As seen, the smoothing properties of the depth estimation method results in realistic depth-map estimates. Spatial interpolation was also performed using the technique in [6] for the generation of the intermediate images of \Tunnel" and \Sergio" using the information of the left camera image and the computed dense disparity eld. The resulting intermediate images are shown in Figures 10f and 11f.

VI CONCLUSIONS A rate-distortion framework was used to de ne a vector eld estimation technique which achieves maximum reconstructed image quality under the constraint of a target bitrate for the coding of the vector sequence. An adaptive version of this technique limits the smoothing e ect to homogeneous areas and avoids highly textured areas and edges. Application of this technique were investigated for the estimation of motion vectors in verylow bitrate image sequence coding. In this case, the proposed algorithm can be combined with an appropriate rate control strategy to optimize the coding of the motion vectors corresponding to all frames of an image sequence. The technique was also evaluated in application for the estimation and coding of dense disparity vector elds. The latter are needed to enable a multiview receiver to generate intermediate images by use of spatial interpolation and also in other applications requiring accurate depth knowledge, such as scenes to be postprocessed using depth keying. Experimental results were presented and evaluated for both above application areas.

12

References [1] H. Li, A. Lundmark, and R. Forchheimer, \Image Sequence Coding at Very Low Bitrates - A Review," IEEE Trans. on Image Processing, vol. 3, pp. 589{609, Sep. 1995. [2] H. G. Musmann, P. Pirsch, and H. J. Grallert, \Advances in Picture Coding," Proc. IEEE, vol. 73, pp. 523{548, Apr. 1985. [3] D. Tzovaras, M. G. Strintzis, and H. Sahinoglou, \Evaluation of Multiresolution Techniques for Motion and Disparity Estimation," Signal Processing : Image Communication, vol. 6, pp. 59{67, Mar. 1994. [4] M. Ziegler, \Digital Stereoscopic Imaging and Application, A Way Towards New Dimensions, The RACE II project DISTIMA," in IEE Colloq. on Stereoscopic Television, (London), 1992. [5] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Disparity Field and Depth Map Coding for Multiview 3D Image Generation," Signal Processing (Image Communication), accepted for publication. [6] M. G. Strintzis, D. Tzovaras, and N. Grammalidis, \Depth Map and Disparity Field Coding for the Communication of Multiview Images," in Proc. 35th Int'l Conf. on Digital Signal Processing '95, (Limassol, Cyprus), June 1995. [7] D. Tzovaras, N. Grammalidis, and M. G. Strintzis, \Object-Based Coding of Stereo Image Sequences using Joint 3-D Motion/Disparity Compensation," IEEE Trans. on Circuits and Systems for Video Technology, to appear 1996. [8] G. Adiv, \Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 7, pp. 384{401, Jul. 1985. [9] N. Grammalidis, S. Malassiotis, D. Tzovaras, and M. G. Strintzis, \Stereo image sequence coding based on 3D motion estimation and compensation," Signal Processing : Image Communication, vol. 7, No. 2, pp. 129{145, Aug. 1995. [10] D. Tzovaras and M. G. Strintzis, \Motion Estimation Using Rate Distortion Theory for Very Low Bit Rate Image Sequence Coding," in Proc. Int'l Conf. Telecommunications '96, (Istanbul, Turkey), Apr. 1996. [11] J. Ribas-Corbera and D. L. Neuho , \Optimal Bit Allocations for Lossless Video Coders : Motion Vectors vs. Di erence Frames", in ICIP' 95, pp. 180-183, Sep. 1995.

13

[12] J. Ribas-Corbera and D. L. Neuho , \Optimal Motion Vector Accuracy for Blockbased Motion-Compensated Video Coders", SPIE Electronic Imaging 1996, Digital Video Compression : Algorithms and Technologies 1996, pp. 302-314, Jan. 1996. [13] J. Ribas-Corbera and D. L. Neuho , \Reducing Rate/Complexity in Video Coding by Motion Estimation with Block Adaptive Accuracy", VCIP'96, pp. 615-624, Mar. 1996. [14] Y. Shoham and A. Gersho, \Ecient Bit Allocation for an Arbitrary Set of Quantizers," IEEE Trans. on Acoust., Speech, Signal Processing, vol. 36, pp. 1445{1453, Sep. 1988. [15] K. Ramchandran and M. Vetterli, \Best Wavelet Packet Bases in a Rate-Distortion Sense," IEEE Trans. on Image Processing, vol. 2, pp. 160{175, Apr. 1993. [16] G. J. Sullivan and R. Baker, \Ecient Quadtree Coding of Images and Video," IEEE Trans. on Image Processing, vol. 3, pp. 327{331, May 1994. [17] H. Everett, \Generalized Langrange Multiplier Method for Solving Problems of Optimum Allocation of Resources," Operation Res., vol. 11, pp. 399{417, 1963. [18] W. K. Press, B. P. Flannery, S. A. Tenkolsky, and W. T. Vetterling, Numerical Recipes in C : The Art of Scienti c Computing, Cambridge, U.K., Cambridge Univ. Press, 1988. [19] J. Konrad and E. Dubois, \Bayesian Estimation of Motion Vector Fields," IEEE Trans. on on Pattern Analysis and Machine Intelligence, vol. 14, pp. 910{927, Sep. 1992. [20] S. Malassiotis and M. G. Strintzis \Joint Motion/Disparity MAP Estimation for Stereo Image Sequences", in IEE Proceedings: Vision, Image & Signal Processing, Vol. 143, No. 2, pp. 101-108, Apr. 1996. [21] L. Falkenhagen, \3D Object-based Depth Estimation From Stereoscopic Image Sequences," in Proc. Int'l Workshop on Stereoscopic and 3D Imaging '95, (Santorini, Greece), pp. 81{86, Sep. 1995. [22] W. L. O. Egger and M. Kunt, \High Compression Image Coding Using an Adaptive Morphological Subband Decomposition," Proc. IEEE, vol. 83, pp. 272{287, Feb. 1995. [23] J. Weng, T. Huang, and N. Ahuja, \Motion and Structure from Two Perspective Views: Algorithms, Error Analysis and Error Estimation," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, pp. 451{476, May 1989. 14

4:5

\Claire"

4 MSE 3:5

3 2:5 0:5

1:5

1

Bitrate

2:5

2

3

Figure 1: Average MSE versus average bitrate (in bits=vector) for the coding of the rst 50 frames of \Claire" using motion compensation. \Tunnel"

45 40 35 MSE

30 25 20 15 10 1

2

3

Bitrate

4

5

6

Figure 2: Average MSE versus average bitrate (in bits=vector) for the coding of the rst 25 frames of \Tunnel" using disparity compensation.

15

6:5

Miss America

3 =50000 6

3

5:5 MSE

3 =20776

5 4:5 4

=6228

3

3:5 3 0:5

1

1:5

=3092

3

2:5

2

=0

3 =2029 3:5

3

Bitrate

3

4

4:5

5

Figure 3: MSE versus bitrate (in bits=vector) for the block-based coding of the fth frame of \Miss America". 6:5 6

3 =30598

\Claire"

3

5:5 MSE

5

3 =12502

4:5

=7181 3 =3732 3 =2064

4 3:5 3

=0

3

0:5

1

1:5

Bitrate

3

2

2:5

3

Figure 4: MSE versus bitrate (in bits=vector) for the block-based coding of the third frame of \Claire".

16

15

\Sergio"

=200000 14 3 13 MSE

3

3 =85125

12

3 =60831

11

3

10 9 8 0:5

=32474 3 =22859

1:5

2:5

=0

3

3:5

4:5

Bitrate

5:5

Figure 5: MSE versus bitrate (in bits=vector) for the block-based coding of the second frame of \Sergio". 45 40 35 MSE

3 =600000

\Tunnel"

3 =328618

30 25

3 =44365

20

3

=29862 =14354

15 10

3

3 1

2

3

Bitrate

3

4

=5163

=0 5

3

6

Figure 6: MSE versus bitrate (in bits=vector) for the coding of the dense disparity eld corresponding to the second frame of \Tunnel".

17

7:5 3 7 6:5 3 6 MSE

5:5 5

\RDOVFE" \RDOVFE+T" \RDOVFE+T+E"

3

4:5 4 3:5 3

3

3 1

3 2

3

Bitrate

3

4

3

5

Figure 7: Adaptive versus non-adaptive versions of the proposed RDOVFE algorithm in terms of MSE versus bitrate (in bits=vector) for the block-based coding of the fth frame of \Miss America".

18

(a)

(b)

(c)

(d)

(e)

(f)

19 Figure 8: (a) Original frame 1 of \Miss America". (b) Original frame 5 of \Miss America". (c) Original motion vector eld estimated with the block matching algorithm. (d) Motion vector eld estimated with the rate-distortion algorithm at 1:58bits=vector. (e) The output of the edge extractor (homogeneous areas marked white, edges are marked grey and highly textured areas marked black). (f) Computed motion vector using the adaptive smoothing algorithm.

(a)

(b)

(c)

(d)

(e)

(f)

20 Figure 9: (a) Original frame 1 of \Claire". (b) Original frame 3 of \Claire". (c) Original motion vector eld estimated with the block matching algorithm. (d) Reconstructed frame 3 of \Claire" using the computed vector eld at 1:1 bits=vector. (e) The output of the edge extractor. (f) Computed motion vector eld using the adaptive smoothing algorithm.

(a)

(b)

(c)

(d)

(e)

(f)

21 Figure 10: (a) Original left channel image \Sergio" (frame 2). (b) Original right channel image \Sergio" (frame 2). (c) Block-based estimate of disparity. (d) The output of the edge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generated using the computed depth.

(a)

(b)

(c)

(d)

(e)

(f)

22 Figure 11: (a) Original left channel image \Tunnel" (frame 2). (b) Original right channel image \Tunnel" (frame 2). (c) Block-based estimate of disparity. (d) The output of the edge extractor. (e) Pixel-based estimate of depth. (f) Intermediate image generated using the computed depth.

Suggest Documents