BACKWARD-FORWARD MOTION ... - Semantic Scholar

1 downloads 0 Views 138KB Size Report
Sep 11, 2002 - The Rugby sequence is an interlaced video at 25 frames per second (PAL). .... Princeton, USA, October 1991, IEEE, pp. 52–60, IEEE Soci-.
Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002

BACKWARD-FORWARD MOTION COMPENSATED PREDICTION 1

Kenneth Andersson, 2 Peter Johansson, 2 Robert Forchheimer and 1 Hans Knutsson 1

1

[email protected] Dept. of Biomedical Engineering, 2 Image Coding Group, Link¨oping University, Link¨oping, Sweden

ABSTRACT

refined using forward motion estimation. In this approach, block artifacts can still be present since the motion estimation is block-based. We propose a new backward-forward motion compensation scheme. Initially a dense motion field is estimated on previously decoded field-images using a new phase-based motion estimator. The motion field is temporally predicted and used for motion compensated prediction of the current fieldimage. To handle non-constant motion and the specific characteristics of the field-image to be coded the initial backward motion compensated prediction is refined using block-based forward motion compensation.

This paper presents new methods for use of dense motion fields for motion compensation of interlaced video. The motion is estimated using previously decoded field-images. An initial motion compensated prediction is produced using the assumption that the motion is predictable in time. The motion estimation algorithm is phase-based and uses two or three field-images to achieve motion estimates with sub-pixel accuracy. To handle non-constant motion and the specific characteristics of the field-image to be coded, the initially predicted image is refined using forward motion compensation, based on block-matching. Tests show that this approach achieves higher PSNR than forward block-based motion estimation, when coding the residual with the same coder. The subjective performance is also better.

2. FEATURE EXTRACTION In the present work the initial feature extraction has mainly two purposes. In addition to produce local velocity estimates for the video coder it is essential that the feature extraction supports a video quality estimator adapted to the human visual system. This opens up the possibility to control video coding using video quality estimation. Quadrature filters are used as a basis for feature extraction in computer vision and there are also indications that the visual cortex contains cells that have similar behaviour [7]. The large amount of image data enforce requirements on computationally efficient filtering. For this reason the quadrature filter responses are generated using a combination of simple sequential 1D filter kernels, i.e. a filter net [8, 9]. The filter net is furthermore designed for interlaced video, the most common format for video sequences. An ideal set of frequency and orientation selective quadrature filters are designed in the Fourier domain to perform the filter optimization. The purpose of the optimization is to design a corresponding set of filter kernels using a limited number of kernel coefficients such that the difference between the frequency response of the ideal and the optimized filters are minimized. The low frequency filters use sparse filter kernels which enable a direct interaction between scales without interpolation. The first stages in the filter net generates bandpass filters from difference of lowpass filters. The last stages produce orientation selective complex valued quadrature ker-

1. INTRODUCTION Standardized hybrid coders like MPEG-1, MPEG-2, MPEG4, H.261 and H.263 use block-based motion estimation and motion compensated prediction. The motion parameters and the prediction residual are coded and transmitted to the decoder. The simplicity and the relatively good performance of block-based motion compensation has made it the standard approach for motion compensation. To deal with the unnatural motion compensated prediction inherent in block-based algorithms, a dense motion field has been estimated and used in a forward motion compensated scheme [1]. To still use a dense motion field but to avoid the overhead to code it, backward video coding has been introduced using the assumption of predictability in space [2], in time [3] or in spatial frequency [4]. In backward video coding, motion estimation is performed at both the encoder and the decoder, and only transmission of the coded prediction residual is necessary. In Lim et.al [5], backward video coding using temporal prediction of a dense motion field is selected for a block when the motion compensated prediction error is lower than for forward video coding using H.263. Yang et.al [6] estimate motion for blocks of size 4 × 4 pixel on previously decoded images at a low-to-high frequency scale. To improve the motion estimates for low bit rates, with increasing quantization noise, the backward motion estimates are

S00-1

Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002



 x nates and t denotes the time sample. Using phase dify ferences from quadrature filter responses sensitive to structures in different orientations we can determine the velocity uniquely if the neighbourhood consists of a structure with multiple orientations. The velocity can be determined from minimization of a cost function

π/2 q3

q5

fy

q7

q2 q6

q9

q4

q1

q8

−π/2 −π/2

fx

π/2

Ev =

N X

(cTi v ˜)2 = v ˜T

i=1

Figure 1: Contour plot of the frequency response of the generated quadrature filters at half amplitude.

3. MOTION ESTIMATION

(5)

i=1

with weights, or reliability, of the phase estimates defined as:

Phase was first introduced in [10] using the Fourier phase for signal registration, i.e. phase correlation. Other approaches based on phase-differences are [11, 12, 13]. Methods based on phase have shown to perform well in comparison with other methods, see [14]. Phase variations are near linear with respect to displacements a feature which makes it possible to obtain sub-pixel accuracy without explicit interpolation. Phase is further relatively insensitive to small variation in luminance, scale and rotation. The approach in this paper is to use the optical flow constraint equation on phase, Eq. 1, as previously [12, 15].

1

cim

=

(mix · miy · mit ) 2 ,

(7)

cif

=

φ ˆ T [ ix ] > 0 else 0, 1 when n φiy

(8)

civ

=

cit

=

1 when||vi || < vmax else 0, φit cos2 ( ), 2

(9) (10)

where cim favourizes filters according to their magnitude due to the local structure and the temporal similarity. cif removes points with negative spatial phase differences, i.e. negative local frequencies, as they are completely unreliable [17, 13]. civ avoids use of phase estimates which correspond to a larger normal velocity than the filter can handle. cit limits problems with the phase wrapping around, i.e. changing sign, for temporal phase differences, by promoting small phase differences before larger phase differences [11]. The components of the 3D orientation tensor can be written as   t 1 t4 t5 (11) T =  t4 t2 t6  . t5 t6 t3

(1)

The phase differences φx , φy and φt and corresponding magnitudes mx , my and mt are computed as the argument and the magnitude of the following functions:   1 , t) · q(x, t)∗ qqx = q(x − 0   1 +q(x, t) · q(x + , t)∗ ), (2) 0   0 , t) · q(x, t)∗ qqy = q(x − 1   0 +q(x, t) · q(x + , t)∗ ), (3) 1 qqt

ci cTi v ˜=v ˜T T˜ v,

where N is the numberof filter  directions around a specific vx center frequency, v ˜ =  vy , T is the estimated 3D ori1 entation tensor [16] and ci are the weighted phase differences from a specific quadrature filter   φix (6) ci = cim · cif · civ · cit ·  φiy  , φit

nels. The optimized filter network requires 141 real-valued multiplications per pixel which corresponds to 15-16 multiplications for each quadrature filter qm . The resulting optimized quadrature filters are shown in Figure 1.

φ x vx + φ y vy + φ t = 0

N X

Minimization of Eq. 5 with respect to v gives   t5 , v = −T−1 2D t6

= q(x, t − 1) · q(x, t)∗ + q(x, t) · q(x, t + 1)∗(4)

where q refers to a response of a quadrature filter, ∗ denotes the complex conjugate, x denotes the spatial coordi-

(12)

where T2D (2x2) is the 2D orientation tensor describing the spatial structure and is found as the upper left part of T.

S00-2

Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002

where g is a Gaussian filter of size 5 × 5 and σ = 1.2, and I˜t−1 (x) is a previously reconstructed field-image. The motion compensated prediction, Iˆvi ,t−1 (x), used in Eq. 15, is defined in Eq. 19. It should be noted that this selection of a motion estimate in each pixel is based on previously decoded field-images and is made at both the encoder and the decoder. The motion is then temporally predicted and used for motion compensated prediction of the field-image to be coded, see section 4.

From this it is evident that the 3D orientation tensor can be determined from the 2D orientation tensor as   T2D v T2D . (13) T= t3 (T2D v)T If multiple orientations do not exist in the local neighbourhood we have to rely on the normal velocity e1 eT1 v=− λ1



t5 t6

 ,

(14) 3.2. Representation of Uncertainty in the Velocity Estimates

where e1 is the eigenvector of the largest eigenvalue, λ1 , of T. Lowpass filtering of the estimated tensor will make it more likely to find the true motion, i.e. a solution to Eq. 12. Using a larger neighbourhood will however increase the possibility that velocities of different objects are mixed in the analysis giving a poor estimate. This trade-off can be dealt with by performing multiple hierarchical motion estimation [18] presented in section 3.1.

Uncertainty in the velocity estimates are due to errors in magnitude and direction. The uncertainty can be described with the covariance matrix X C(x) = (v(x) − m(x))(v(x) − m(x))T + εI, (16) x

where m(x) is the local mean velocity determined with a Gaussian lowpass filter and εI is an isotropic part to limit the range of the inverse of the uncertainty, i.e. the certainty. At a point where the velocity is larger than what can be estimated, considering the used filters, the diagonal of the covariance matrix is forced to be huge. The inverse of the uncertainty matrix reflects the certainty of a motion estimate. In a local neighbourhood we can weight our velocity estimates according to this uncertainty using normalized convolution. Normalized convolution was first introduced in [19]. X X −1 v(x) = ( g ∗ C−1 ( g ∗ C−1 (17) i ) i vi (x)),

3.1. Multiple Hierarchical Motion Estimation The smooth motion field of hierarchical motion estimation leaks motion to the background near moving objects. Motion compensation of an object according to such a motion will move the background along with the object. This gives a perceptually annoying distortion. Our method increases the spatial resolution of the estimated motion field with use of several parallel hierarchical motion estimators starting at different scales [18]. The multiple hierarchical motion estimator (MHME), consists of one global and three local hierarchical motion estimators starting on different resolutions. To handle camera panning, i.e. global shift of the whole image, the local motion estimators are initially compensated according to a global hierarchical motion estimate. Areas not following the global motion are corrected by the local motion estimators. The quadrature filters are divided into motion estimation units, q1 ,q2 ,q3 (fine), q4 ,q5 ,q6 ,q7 (medium) and q6 ,q7 ,q8 ,q9 (coarse) respectively. The estimated tensor from each motion estimator unit is integrated with corresponding Gaussian lowpass filter, with σ = 1.2 (fine), σ = 2.4 (medium) and σ = 3.6 (coarse), before the velocity is extracted using Eq.12. The global motion estimator uses integration over the whole image. The hierarchical motion estimator which uses all scales can estimate velocities up to about 10 pixel per field-image. The algorithm produces four motion estimates with sub-pixel accuracy in each pixel, one global and three local estimates. The motion estimate, at each pixel, which gives the least local motion compensated prediction error is selected as the output from the MHME: v = arg min(g ∗ |Iˆvi ,t−1 (x) − I˜t−1 (x)|), vi

i

i

where g is a separable Gaussian filter with size 5 × 5 and σ = 1.2 and i is the number of velocity estimates that should be combined. This can be used to combine velocity estimates from different scales of the MHME, but is here only used to refine the estimated motion from Eq.12. 4. BACKWARD-FORWARD MOTION COMPENSATED PREDICTION We use backward motion compensated prediction based on the assumption that the motion is predictable in time as previously [3, 5]. The motion is estimated on previously decoded field-images. The prediction is then performed in two steps, first prediction of vt using vt−∆t : ˆ t (x) = vt−∆t (x + ∆t · vt−∆t ) v

(18)

ˆ t and ˜It−∆t and then prediction of It using v ˆIt (x) = ˜It−∆t (x − ∆t · v ˆ t (x))

(19)

Some potential problems with backward prediction in this case are:

(15)

S00-3

Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002

overhead for the forward motion compensation it is based on sub-pixel block-matching with a small search window. A small search window, typically less than 1 × 1 pixel, also reduces problems with blocking artifacts as those in a standard forward motion compensation scheme. The forward compensation refinement is only used when the reconstruction error is lower than for only using the backward compensation. One bit for each field-image is transmitted to the decoder to determine the use of forward refinement. Motion estimation based on three decoded field-images, I˜t−3 , I˜t−2 and I˜t−1 , as in Eq. 4, gives an estimate for the center image I˜t−2 . If the motion is constant in time the motion can be accurately predicted in time. To keep the spatial accuracy in the vertical direction (of the interlaced image sequence) when the motion is constant in time, we use ∆t = 2 in Eq.18 and Eq.19 for the motion compensated prediction of It . If the motion is non-constant in time a better estimate is achieved if only the two previously decoded field-images, I˜t−2 and I˜t−1 , are used in the motion estimation. For this case the current field-image is predicted using I˜t−1 and vt−1 . The approach with least motion compensated prediction error is selected.

1. Irregular sampling. The positions of the predicted signals and the sampling pattern of the desired output are not aligned. There can also be ’holes’ at the desired output positions, because no prediction ends up there. An example is uncovered regions due to object motion. 2. Multiple motions. Many predictions can end up in the same output neighbourhood, i.e. occlusion. A simple example is a moving object on a stationary background. 3. Non-constant motion. The motion of an object at time t − ∆t and t is not the same. 4. Varying image characteristics. Examples are changing lighting conditions and quantization noise. Continuous normalized convolution (CNC) [20], an extension of normalized convolution [19], is used to deal with the first two problems. A continuous filter is approximated using a discrete filter whose shape is tuned using a spline approximation of the continuous filter. The idea is to integrate knowledge of signals and certainties of the signals, arbitrarily sampled, to get an estimate of the signal in an arbitrary output position, assuming a local model of the signal. To completely solve the second problem above, the motion of foreground objects that also end up visible at the next time sample should have certainty one. Certainty zero should be given to the motion of objects that end up in the background. Currently we use certainty one for all motion estimates. The initial motion compensated prediction moves the image characteristics of a previous field-image using a dense motion field. To deal with the problems of a non-constant motion field and varying image characteristics, the initially predicted field-image is refined using forward motion compensation. This new backward-forward motion compensation scheme (BFMC) is found in Figure 2. To achieve a low rate Previously Decoded Field−images

Motion Estimation

5. EXPERIMENTAL RESULTS The BFMC is compared against a block-based forward motion compensation scheme (BMA) both using the SPIHT coder [21] for encoding the intra field as well as the motion compensated residuals. We also make a comparison with a MPEG2 coder [22]. All coding is performed on luminance images only. The first field-image is intra coded (I picture), all other field-images are inter coded (P pictures). The coding performance is evaluated using peak-signal-to-noise-ratio (PSNR). The motion compensation is performed on fields of interlaced video. The forward refinement in the BFMC is based on blockmatching. The block size (y × x) is 8 × 16 pixel and the search is with quarter-pel accuracy within ±0.75 displacement using bi-linear interpolation. The BMA uses a block size of 8 × 16 pixel. We use integer search within (±8, ±16) displacement followed by half-pel refinement within ±1 half-pel displacement from the best integer match and finally quarter-pel refinement within ±1 quarter-pel displacement from the best half-pel match, using bi-linear interpolation. The motion estimation is performed both using the previously decoded field-image and the current field-image to be coded, and, the second previous decoded field-image and the current field-image. The displacement with least MSE is selected. As for the BFMC this requires a transmission cost of one bit per field-image. The coding rate of the velocity field of both the BFMC and the BMA is estimated using Huffman coding of each field. The residual coding rate is reduced with the rate for the velocity field coding. To have the same starting point of both

Backward Backward Predicted Estimated Field−image Motion Motion Compensation

Reference Motion Field−image Compensated Refinement Motion Refinement Field Backward−Forward Predicted Field−image + −

Motion Field Coding Residual Coding

Figure 2: Backward-forward motion compensation scheme (BFMC).

S00-4

Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002

give a prediction with lower average PSNR than using the reference motion field but the prediction remains quite stable with decreasing coding rate. The effect of coding on the accuracy of the motion estimates is shown in Figure 4. The results for our motion estimator

the BFMC and the BMA the first three field-images are coded the same, first one is intra field coded followed by two BMA predicted fields. The same intra and inter coding rate is used for the BFMC and the BMA. The Yosemite Sequence A commonly used test sequence in evaluation of optical flow algorithms is the Yosemite sequence. The sequence was generated using a digital terrain map and includes velocities up to about 4 pixel/frame. The progressive Yosemite sequence has been lowpass filtered and down sampled vertically to produce an interlaced sequence. The performance on this sequence is often measured in angular error [14] T ˆ true ) v ∆ψ = arccos(ˆ vest

ˆ = √ where v

1 (vx , vy , 1)T , 2 +v 2 +1 vx y

Angular error [deg]

25

20

15

(20)

10

the velocity represented

5

T

as a 3D vector, for the 2D velocity v = (vx , vy ) . First we compare the backward motion compensated prediction based on estimates from our motion estimator with motion compensated prediction based on the reference motion field, see Figure 3. For rates larger than 1.1 bpp our motion

0.2

0.4

0.6

0.8 1 Bit rate [bpp]

1.2

1.4

Figure 4: Motion estimation accuracy in angular error versus coding rate, for the field-image corresponding to frame 9 of the Yosemite sequence. With 100% density (solid). Excluding the sky (dash-dot).

33

for the field-image corresponding to frame 9, estimated on two reference field-images, with 100 % density is 10.7 deg angular error. Best result reported at 100 % density, in Barron et.al [14], is about 10 deg angular error. Without the sky, which often is removed in evaluations, we get 6.3 deg angular error. Common to all approaches which performs good on the Yosemite sequence is the use of a relatively large spatio-temporal support. Our motion estimator is designed with more complex motion fields in mind and uses only two or three fields of interlaced video in the motion estimation. In Figure 5 rate distortion performance on the Yosemite sequence is shown. The BFMC has up 1.6 dB better average PSNR than MPEG-2 and up to 0.4 dB better average PSNR than the BMA. The BMA is also better than the MPEG-2 coder due to the use of quarter-pel accuracy and partly due to the use of the SPIHT coder. However, for rates at and below 0.25 bpp the MPEG-2 coder is better than both the BFMC and the BMA. Note that the MPEG-2 coder is a complete coder with features like e.g. intra coding on selected blocks.

32

PSNR [dB]

31 30 29 28 27 26

0.2

0.4

0.6

0.8 1 Bit rate [bpp]

1.2

1.4

Figure 3: Motion compensation accuracy versus coding rate, for the field-image corresponding to frame 10 of the Yosemite sequence. Based on prediction of the estimated motion of frame 9 (solid). Based on prediction of the reference motion of frame 9 (dash-dot).

estimator gives a motion compensated prediction with higher average PSNR than that obtained by using the reference motion field. This may seem surprising at first, however the best motion field for prediction is not necessarily the reference motion field. This can be the case since the reference motion field is the true velocity at the instant corresponding to the previous field-image, i.e. the field-image used for prediction. The best motion field for prediction should be at some time instant inbetween the predicted field-image and the field-image used for prediction. Motion estimates from decoded images at coding rates between 0.1 bpp and 1.1 bpp

The Mobile Calendar Sequence The Mobile Calendar sequence is an interlaced video at 25 frames per second (PAL). The sequence contains relatively slow motions but include highly detailed moving areas. The results on the Mobile Calendar sequence in terms of ratedistortion is shown in Figure 6. The BFMC has 0.4-1.7 dB better average PSNR than the MPEG-2 coder for rates at or above 0.5 bpp. For rates at and below 0.25 bpp the MPEG2 coder is better than both the BFMC and the BMA. The BFMC has up to 0.25 dB better average PSNR than the BMA.

S00-5

Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002

42

40

40

38 36 34

36

PSNR [dB]

PSNR [dB]

38

34 32

30 28

30

26

28 26

32

24 0.2

0.4

0.6

0.8 1 Bit rate [bpp]

1.2

22

1.4

0.2

0.4

0.6

0.8 1 Bit rate [bpp]

1.2

1.4

Figure 7: Rate distortion performance on central 288 × 360 part of the first 20 frames of the Rugby sequence. Reconstruction using BFMC (solid), BMA (dotted), MPEG-2 (dash-dot) and intra coding (dashed). The average PSNR’s are 32.9 dB, 32.6 dB, 32.0 dB and 30.8 dB.

Figure 5: Rate distortion performance on the Yosemite sequence. Reconstruction using BFMC (solid), BMA (dotted), MPEG-2 (dash-dot) and intra coding (dashed). The average PSNR’s are 35.1 dB, 34.9 dB, 34.1 dB and 31.2 dB. 38 36

gives a motion compensated prediction with slightly lower PSNR than that of the BMA. However, for velocity field coding, the BFMC uses only 0.035 bpp compared to 0.066 bpp for the BMA. A comparison of coded subjective quality on the Rugby sequence is shown in Figure 9. The BFMC is free from blocky artifacts which leads to a more pleasant subjective performance around edges of moving objects. However, relatively flat regions should be coded more carefully than regions of high contrast since flat regions are subjectively more sensitive to coding errors. The MPEG-2 coder considers this by scaling the quantization step size using a measure of spatial activity for each 8 × 8 block and thus gets a better subjective performance in relatively flat regions.

PSNR [dB]

34 32 30 28 26 24 22

0.2

0.4

0.6

0.8 1 Bit rate [bpp]

1.2

1.4

Figure 6: Rate distortion performance on the first 20 frames of the Mobile Calendar sequence. Reconstruction using BFMC (solid), BMA (dotted), MPEG-2 (dash-dot) and intra coding (dashed). The average PSNR’s are 30.8 dB, 30.7 dB, 30.2 dB and 26.1 dB.

6. CONCLUSIONS We have presented new motion estimating and motion compensation techniques intended for video coding. The motion estimation is based on phase and intended for interlaced images. The approach uses backward motion compensation to achieve pixel-based prediction. This initial prediction is then refined using forward motion compensation to compensate for non-constant motion and to consider the specific characteristics of the image to be coded. Using block-matching with a small search area this can be achieved at a lower coding cost than for standard forward motion compensation. Experimental results show that this new approach for motion compensation has up to 1.6 dB higher PSNR than MPEG-2 and is up to 0.5 dB better than block-based motion estimation using the same residual coder as in our approach, for coding of interlaced video. Perhaps most important, the subjective performance of our approach is also better.

The coding rate for the velocity field is about 0.006 bpp for the BFMC and about 0.06 bpp for the BMA. The Rugby Sequence The Rugby sequence is an interlaced video at 25 frames per second (PAL). It contains both fast motions of objects as well as global motion. To focus the evaluation on the moving objects we use the central 288×360 part of the Rugby sequence. The results in terms of rate-distortion is shown in Figure 7 and specificly for rate 0.75 bpp in Figure 8. It is shown that the BFMC has 0.9-1.1 dB better average PSNR than the MPEG-2 coder for rates between 0.25-1.5 bpp. For the rate 0.125 bpp the MPEG-2 coder is better than both the BFMC and the BMA. The BFMC has up to 0.5 dB better average PSNR than the BMA. As is shown in Figure 8 the BFMC

S00-6

Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002

[8] H. Knutsson, M. Andersson, and J. Wiklund, “Advanced filter design,” in Proceedings of the Scandinavian Conference on Image analysis, Greenland, June 1999, SCIA.

40 38

PSNR [dB]

36

[9] M. Andersson, J. Wiklund, and H. Knutsson, “Filter networks,” in Proceedings of Signal and Image Processing (SIP’99), Nassau, Bahamas, October 1999, IASTED, Also as Technical Report LiTH-ISY-R-2245.

34 32 30 28

[10] C. Kuglin and D. Hines, “The phase correlation image alignment method,” in Proceedings IEEE Int. Conf. Cybern. Soc. IEEE, 1975, pp. 163–165.

26 24 22

5

10

15

20 field

25

30

35

[11] R. Wilson and H. Knutsson, “A multiresolution stereopsis algorithm based on the Gabor representation,” in 3rd International Conference on Image Processing and Its Applications, Warwick, Great Britain, July 1989, IEE, pp. 19–22.

40

Figure 8: Coding at 0.75 bpp on central 288 × 360 part of the first 20 frames of the Rugby sequence. Top: Reconstruction using BFMC (solid), BMA (dotted) and MPEG-2 (dashdot). The average PSNR’s are 34.1 dB, 33.7 dB and 33.2 dB. Bottom: Motion compensated prediction for BMA (dotted), the proposed coder without forward refinement (dashed) and with forward refinement (BFMC) at 0.035 bpp (solid). The BMA uses 0.066 bpp for coding of the velocity field.

[12] D. J. Fleet and A. D. Jepson, “Computation of Component Image Velocity from Local Phase Information,” Int. Journal of Computer Vision, vol. 5, no. 1, pp. 77–104, 1990. [13] C-J. Westelius, Focus of Attention and Gaze Control for Robot Vision, Ph.D. thesis, Link¨oping University, Sweden, SE-581 83 Link¨oping, Sweden, 1995, Dissertation No 379, ISBN 917871-530-X. [14] J. L. Barron, D. J. Fleet, and S. S. Beauchemin, “Performance of optical flow techniques,” Int. J. of Computer Vision, vol. 12, no. 1, pp. 43–77, 1994.

7. ACKNOWLEDGMENTS This work was supported by the Swedish Foundation for Strategic Research (SSF) and the program Visual Information Technology (VISIT).

[15] M. Hemmendorff, Motion Estimation and Compensation in Medical Imaging, Ph.D. thesis, Link¨oping University, Sweden, SE-581 85 Link¨oping, Sweden, 2001, Dissertation No 703, ISBN 91-7373-060-2. [16] H. Knutsson, “Representing local structure using tensors,” in The 6th Scandinavian Conference on Image Analysis, Oulu, Finland, June 1989, pp. 244–251, Report LiTH–ISY–I–1019, Computer Vision Laboratory, Link¨oping University, Sweden, 1989.

8. REFERENCES [1] R. Krishnamurthy, P. Moulin, and J. W. Woods, “Optical flow techniques applied to video coding,” in Proc. IEEE Int. Conf. Image Processing, Washington, USA, 1995, vol. I, pp. 570– 573. [2] A. N. Netravali and J. D. Robbins, “Motion compensated television coding: Part 1.,” Bell System Technical Journal, vol. 58, pp. 631–670, 1979. [3] T. Naveen and J. W. Woods, “Motion compensated multiresolution transmission of high defintion video,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 4, pp. 29–41, February 1994. [4] H. Li, D. C. Escudero, and R. Forchheimer, “Motion compensated multiresolution transmission of digital video signals,” in Proceedings of the 1995 International Workshop on Very Low Bit-rate Video. Nov 1995, H. Harashima, Ed. [5] K. P. Lim, M. N. Chong, and A. Das, “Low-bit-rate video coding using dense motion field and uncovered background prediction,” IEEE Transactions on Image Processing, vol. 10, pp. 164–166, 2001. [6] X. Yang and K. Ramchandran, “Scalable wavelet video coding using aliasing-reduced hierarchical motion compensation,” IEEE Transactions on Image Processing, vol. 9, no. 5, pp. 778–791, 2000. [7] D. A. Pollen and S. F. Ronner, “Spatial computation performed by simple and complex cells in the visual cortex of the cat,” Vision Research, vol. 22, pp. 101–118, 1982.

[17] D. J. Fleet and A. D. Jepson, “Stability of phase information,” in Proceedings of IEEE Workshop on Visual Motion, Princeton, USA, October 1991, IEEE, pp. 52–60, IEEE Society Press. [18] K. Andersson and H. Knutsson, “Multiple hierarchical motion estimation,” in Proceedings of Signal Processing, Pattern Recognition, and Applications (SPPRA’02), Crete, Greece, June 2002. [19] H. Knutsson and C-F. Westin, “Normalized and differential convolution: Methods for interpolation and filtering of incomplete and uncertain data,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 1993, pp. 515–523. [20] K. Andersson and H. Knutsson, “Continuous normalized convolution,” in Proceedings of International Conference on Multimedia and Expo (ICME’02), Lausanne, Switzerland, August 2002, To appear. [21] A. Said and W.A. Pearlman, “A new fast and efficient image coded based on set partitioning in hierarchical trees,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 6, pp. 243–250, 1996. [22] MPEG Software Simulation Group, MPEG-2 Encoder / Decoder, Version 1.2, July, 1996.

S00-7

Proceedings of ACIVS 2002 (Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 9-11, 2002

Figure 9: Comparison of coded subjective quality for coding at 0.75 bpp on central 288 × 360 part of the first 20 frames of the Rugby sequence. A part of field-image 8 is shown. Top: Reference. 2:nd: BFMC (PSNR 34.6) dB. 3:rd: BMA (PSNR 34.1 dB). Bottom: MPEG-2 (PSNR 33.6 dB).

S00-8