Real Time Global Motion Estimation for an MPEG-4 Video ... - CiteSeerX

Real Time Global Motion Estimation for an MPEG-4 Video Encoder H. Richter† , A. Smolic††, B. Stabernack††, E. M¨uller† † )University

of Rostock, Institute of Communications and Information Electronics Richard-Wagner-Strasse 31, 18119 Rostock, Germany e-mail: buggs erika.mueller @ntie.e-technik.uni-rostock.de

††)

Heinrich-Hertz-Institut Berlin, Image Processing Department Einsteinufer 37, 10587 Berlin, Germany e-mail: smolic stabernack @hhi.de

ABSTRACT This contribution presents a powerful method for real-time capable global motion estimation. Up to now no further solutions are known to the authors. Global motion estimation is a great tool in MPEG-4 coding process to improve overall visual quality. The main disadvantage of the reference implementation in the current Verification Model is an unacceptable low computation speed. By means of consequently speed optimized algorithms a 600 MHz Intel Pentium has shown to be sufficient for the aquired task.

1 INTRODUCTION The term global motion is used to describe occurring 2D image motion caused by camera motion. Several mathematical models are available for a parametrical description of motion in images. In the case of MPEG-4 the 6 parameter affine model is very attractive since it is able to handle all kinds of linear distorsions at low computational complexity. Current state of the art approaches for estimating global motion [1][4][5] combine a rough, wide-range initial step followed by a differential refinement stage. In an initial stage a set of features is extracted from the given images. By tracking these features a set of motion vectors is obtained on which a global model is applied. The feature selection process is based on a confidence measure introduced by Shi and Tomasi [2]. A feature is defined as a small window containing an image part that can be tracked reliably by a block matching

algorithm. Through adaption of the search range a minimum of utilized computation time could be achieved. In the beginning of the estimation process and after a scene change no information about the motion is known. Therefore a relatively high search range has to be checked. Due to the fact that camera motion usually changes slowly, the range can be lowered in consecutive frames by using previous estimation results as predictor. Because it is unknown if the tracked features belong to foreground or background, local object motion might show an inconvenient impact on the global measure. Hence, it is required to minimize this influence. A suitable post processing option is given by the Maximum-Likelihood operator, also known as robust M-Estimator. Typically, 5-7 iterations of a robust M-Estimator are sufficient to distinguish between reliable measures and outliers. Due to a limited number of involved features and motion vectors this type of post processing doesn‘t noticeably affect the overall computation performance. The refinement stage of the proposed estimation hierarchy is based on an optical flow strategy. In [5] the iterative descending Gauss-Newton algorithm is proposed which delivers reliable measures on the cost of excessive computational requirements. An approach to reduce estimation complexity was presented in [3]. By rejecting unstructured areas from estimation the number of candidate pixels decreases and thus a lower amount of operations is required. These ex-

cluded areas don’t carry meaningful motion information and would disturb the estimation process anyway (aperture problems). Another drawback of the MPEG-4 reference implementations is the use of first order gradient operators for calculating required spatial gradients. In textured image areas where high frequencies are dominating the first order gradient operator is unable to estimate spatial gradients correctly. A solution in form of polynomial filters is explained in [6].

Fig. 2: feature selection: a) original, b) filtered image, c) selected features

the search range is kept low using estimations between former frames as predictor. The already introduced necessity to eliminate disturbing influences caused by differently moving foreground objects is performed us2 METHOD ing a robust M-Estimator. A difference (ε m ) A block diagram of the proposed algorithm is between the mean translation (a1 b1 Ξ) and shown in Fig.1. m m each individial measure (vx vy ) is calculated: I0 I1

Feature Matching

ΞT

I0 current image I1 reference image ΞT,ΞA motion parameters (translation, affine) IP predicted image

m

ε m vx

m vy b 1

a1

(2.1)

Then the mean error is: differential affine Estimation

µε

ΞA

1 ε m M m∑

M

(2.2)

IP

As weighting function for each individual measure Tukey’s Biweight was chosen where c is a tuning constant affecting the converFig. 1: Block diagram of proposed algorithm for gence behaviour: Warping

Global Motion Estimation and Compensation

2.1 FEATURE MATCHING By definition reliable tracking of features is available in well structured image areas. Therefore the feature selection process is performed after highpass filtering one of the two images involved in each estimation. The Laplace Operator with its FIR filter coefficients 1,-2,1 in both directions is a suitable low complexity filter. It can be implemented without multiplications, allows heavy parallel processing and concentrates resulting filter output into the low value range, leaving a minor number of peaks. This effect leads to a simplified selection stage. A one-pass binary classification in conjunction with small ringbuffers for each binary class suffices to find the desired features. Furthermore, the image is divided into several tiles to ensure getting features from every image area. Fig.2 illustrates this process. To track the features a full search 8x8 blockmatching is applied. As already mentioned

w ε

ε 1 c µ ε 0

2

2

ε c µε

ε c µε (2.3) With the shown prerequisites the following iteration function for a1 b1 Ξ is applied: Ξk

1 ∑ w m m

m

∑ w m vx m

m

∑ w m vy m

(2.4)

The shown algorithm is known for fast convergence behaviour, typically around 5-7 iterations are required. 2.2 DIFFERENTIAL AFFINE ESTIMATION After predicting the global translation a higher order refinement takes place. In principle the goal of motion compensation is to make the transformed image as similar as possible to the actual image [3]. With I0 as the current image and I1 as reference image, an image point in I0 or I1 is defined

as p x y . The transformation T p Ξ of each image point is controlled by the parameter vector Ξ. In case of the affine 6 parameter model and a b Ξ the following transformation is applied:

a 1 a2 x a 3 y b 1 b2 x b 3 y

T p Ξ

(2.5) The similarity at a certain point is measured in terms of the residual: ε p Ξ I1 p I0 T p Ξ

(2.6)

In a M-Estimator approach a function ρ of the residuals with a scale µ is used to minimize ε: min ∑ ρ ε p Ξ µ

(2.7)

Ξ p R

in order to reach the required computation speed. We present two methods for further improvement of convergence behaviour. Instead of applying the lowpass and derivation filter pair onto each pixel in each iteration as proposed in [6] we chose to pre-filter both images before the estimation takes place. The lowpass filter solves two important issues. It aims at the already mentioned statistical errors calculating spatial gradients and furthermore reduces noise. As tradeoff between accuracy and speed a simple 3-tap gaussian filter with the coefficients 14 12 14 is used. The second improvement targets the interaction between the feature matching and differential stages. In scenes where only minor changes of global motion are apparant, it has shown that the results from the last differential estimation as start vector offer a faster convergence speed whereas the estimated translation acts as good start vector on quickly changing motion behaviour.

This is solved using an iterative GaussNewton minimization. The weighting of each point contributing to the estimation can be expressed as w ε dρ dε ε. Using the simplified method introduced in [3] the weighting function is reduced to a single binary decision, i.e. a point participates fully as in a non-robust solution or not at all: 3

IMPLEMENTATION

ε2 µ ε2 µ

The explained algorithms were implemented (2.8) first in portable ANSI-C. All time critical parts have been rewritten in hand-optimized assembly language, specifically designed for The value of µ is determined as the mean square of all N considered points within re- use with Intel Pentium III and compatible gion R . It allows automatic scaling of the processors. The main targets in this step were efficient use of available SIMD engines in weighting function. To reduce the influence of unstructured areas conjunction with minimized load on memwe do an a priori evaluation of the spatial gra- ory bus. Through interleaving code using all dients (Ix Iy ) and use only those pixels (v 0) three concurrently available engines (Integer, in the estimation which satisfy the following MMX, ISSE) at the same time a potentially high scalability could be achieved. condition: w ε

v

0 1

1 0

Ix d Iy d otherwise

(2.9) A threshold d between 10 and 20 leaves only those pixels which lead to a fast convergence behaviour of the Gauss-Newton algorithm. We reached a relatively constant number of pixels involved in the estimation by adapting d on the pixel count in former estimations. Due to the computation-intensive nature of the chosen Gauss-Newton algorithm - 18 coefficients of a matrix plus 6 coefficients of an error vector have to get updated for every pixel involved in the estimation - it is required to keep the number of iterations low

4 EXPERIMENTAL RESULTS The proposed algorithm has been tested with several sequences, for instance with the well known Mobile, Horse and Stefan sequences often used to study global motion algorithms. Fig.3 shows the impact of the proposed lowpass filtering on the estimation, calculating the background-PSNR between the original and predicted images. Estimating the global motion between filtered frames results in remarkably better prediction accuracy. To measure the performance of this algorithm in direct comparison to the reference MPEG4 encoder the developed routines have been

PSNR in dB

integrated into MS-PDAM1 . Fig.4 and Fig.5 der similar conditions, the developed solushow the achieved compression ratio and tion works 230 times faster. Including necesPSNR for the Stefan and Horse sequences. sary motion compensation routines the overall cost for integration of the proposed algorithms into an MPEG-4 encoder solution is 38 robust, max. 4 Iterations, w/o lowpass filter around 18-22 milliseconds on common midrobust, max. 4 Iterations, with lowpass filter 36 class PC systems. In conjunction with an op34 timized MPEG-4 encoder, an 800 MHz Pentium III/E has shown to be sufficient to en32 code GMC frames at 25 FPS. 30

5 CONCLUSION

28

We presented a robust global motion estimator fast enough for real-time operation on Fig. 3: PSNR in background with and with- common hardware. By concentrating on reout pre-filtering, Mobile sequence (352x240 pels, ducing unnecessary operations and disturbing side-effects the usual accuracy penalties YUV4:2:0) of high-speed applications could be success32 31 fully avoided. The proposed algorithm per30 forms excellent in direct comparison to the 29 28 MPEG-4 reference implementation. 26

PSNR in dB

0

27 26 25 24 23 200

50

100 150 200 Picture number in Sequence

250

300

REFERENCES

without GMC GMC, reference implementation GMC, proposed algorithm 400

600

800 1000 Bitrate in kBit/s

1200

1400

1600

PSNR in dB

Fig. 4: Rate-Distortion curves for Stefan sequence (352x240 pels, YUV 4:2:0) 31 30 29 28 27 26 25 24 23

[2] J. Shi and C. Tomasi. Good Features to Track. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1994

without GMC GMC, reference implementation GMC, proposed algorithm 500

1000 1500 Bitrate in kBit/s

[1] A. Smolic, T. Sikora and J.-R. Ohm. Long-Term Global Motion Estimation and its Application for Sprite Coding, Content Description and Segmentation. IEEE Trans. on CSVT, Vol. 9, No. 8, pp. 1227-1242, December 1999

2000

Fig. 5: Rate-Distortion curves for Horse sequence (352x288 pels, Y only)

The results obtained with the proposed highspeed algorithm are excellent and match those of the MPEG-4 reference implementation. In case of the Stefan sequence the new developed algorithm shows up superiour tracking quick camera motion as apparant in the last thirty frames. The whole optimized estimation process needs around 12-15 ms for pictures in CIF format2 . In contrast to the reference implementation which needs about 3.5 seconds un1 MPEG4 Verification Model, reference encoder by

Microsoft 2 600 MHz Intel Pentium III /w Katmai Core, Linux 2.2.16, gcc2.95, nasm0.98

[3] A. Smolic and J.-R. Ohm. Robust Global Motion Estimation using a simplified M-Estimator approach. Proceedings of IEEE International Conference on Image Processing, 2000, Vancouver, Canada [4] H. Schwarz and E. M¨uller. Hypothesis-based Motion Segmentation for Object-based Video Encoding. Proceedings of Picture Coding Symposium, 1999, Portland, Oregon [5] ISO/IEC/JTC1/SC29/WG11. MPEG-4 Video VM 16.0, Doc. No. N3312, Noordwijkerhout, Netherlands, 2000 [6] E. P. Simoncelli. Distributed Representation and Analysis of Visual Motion. PhD Thesis, MIT, USA, 1993

Real Time Global Motion Estimation for an MPEG-4 Video ... - CiteSeerX

Real Time Global Motion Estimation for an MPEG-4 Video ... - CiteSeerX

Suggest Documents

Global motion estimation algorithm for video

IRJET- Real Time Video Object Tracking using Motion Estimation

motion estimation for video bandwidth

TEMPORAL VIDEO SEGMENTATION FOR REAL-TIME ... - CiteSeerX

Processing real-time stereo video for an autonomous ... - CiteSeerX

a new approach for real time motion estimation ... - Semantic Scholar

Real Time Video Data Mining for Surveillance Video ... - CiteSeerX

MPEG4 â Advanced Video Coding

MULTISCALE MOTION ESTIMATION FOR SCALABLE VIDEO CODING

Real-time 3D Motion Capture - CiteSeerX

Real-time video analysis on an embedded smart camera ... - CiteSeerX

abnormal motion detection in real time using video surveillance and ...

Real-Time On-Demand Motion Video Change ... - Semantic Scholar

Real-time Abnormal Motion Detection in Surveillance Video

Content-Based Video Quality Prediction for MPEG4 Video Streaming ...

Metadata-Assisted Global Motion Estimation for

JOINT GLOBAL MOTION ESTIMATION AND CODING FOR ...

REAL-TIME ESTIMATION OF GEOMETRICAL ... - CiteSeerX

Real-Time Motion Blur Estimation and Restoration in ...

video receiver based real-time estimation of ... - Semantic Scholar

Real-time Estimation of UAV Attitude from Aerial Fisheye Video

A Real Time Anatomical Converter For Human Motion ... - CiteSeerX

Captured Motion Data Processing for Real Time Synthesis ... - CiteSeerX

An Efficient Video Rendering System For Real-Time Adaptive Playout ...

Real Time Global Motion Estimation for an MPEG-4 Video ... - CiteSeerX