Automatic Extraction of Motion Trajectories in ... - Semantic Scholar

16 downloads 15421 Views 571KB Size Report
Center for Multimedia & Network Technology, School of Computer Engineering. Nanyang ... vectors in the compressed video using an iterative algorithm with robust outlier .... rameters that we call motion vector affine parameters. The classical ...
Automatic Extraction of Motion Trajectories in Compressed Sports Videos Haoran Yi, Deepu Rajan and Liang-Tien Chia Center for Multimedia & Network Technology, School of Computer Engineering Nanyang Technological University, Singapore 639798 {pg03763623, asdrajan, asltchia}@ntu.edu.sg ABSTRACT

hyperlinking for rich interactive viewing etc. This paper addresses the problem of extracting trajectories from MPEG compressed sports videos. The problems of trajectory extraction and tracking are closely related. However, while the goal of tracking is to follow video objects in the scene and to update their 2D shape from frame to frame [1], trajectory extraction deals with determining the spatiotemporal positions of one or more moving objects in a sequence in terms of one representative point of the object(s). Unlike in tracking, where some property of a bounding box is searched for in subsequent frames, we view trajectory extraction as a process of assigning moving blobs to a dynamic list of trajectories that evolves as the video progresses. Our trajectory extraction approach is also different from the traditional background subtraction method [2]. Typical applications of such method, like surveillance, assume that the camera is stationary. This assumption is not always valid, especially in the case of sports videos discussed in this paper. This make the trajectory extraction in sports video more challenging. The trajectory extraction approach presented here takes into account the global motion associated with the movement of the camera and compensates its effect to get a trajectory of objects having significant size and local motion. Hence, our method can be viewed as a generalization of the tracking problem on two accounts - first, it can handle the case of both stationary as well as moving camera and second, it caters to both, a specific predefined object as well as to all objects with significant motion. The paper is organized as follows. Section 2 describes the 4 steps for motion trajectory extraction - global motion estimation, motion blob detection, trajectory evolution and trajectory refinement. Section 3 presents the experimental result that qualify the performance of the proposed approach. Finally, Section 4 concludes the paper.

This paper presents an algorithm for automatically extracting significant motion trajectories in sports videos. Our approach consists of four stages: global motion estimation, motion blob detection, trajectory evolution and trajectory refinement. Global motion is estimated from the motion vectors in the compressed video using an iterative algorithm with robust outlier rejection. A statistical hypothesis test is carried out within the Block Rejection Map(BRM), which is the by-product of the global motion estimation, for the detection of motion blobs. Trajectory evolution is the process in which the motion blobs are either appended to an existing trajectory or are considered to be the beginning of a new trajectory based on its distance to an adaptive trajectory description. Finally, the extracted motion trajectories are refined using a Kalman filter. Experimental results on both indoor and outdoor sports videos demonstrate the effectiveness and efficiency of the proposed method. Categories and Subject Descriptors: I.4.8 [Computing Methodologies]: Image Processing and Computer Vision General Terms: Algorithms Keywords: MPEG-7, motion descriptors, motion trajectory, Kalman filter.

1.

INTRODUCTION

Semantic understanding of video content is highly dependent on the utilization of contextual information and domain rules. The proliferation of sports videos has created a need for sports video content analysis and management. Processing of sports video includes detection of important events, following a specific player’s actions and summarization. Thus, the results of processing should be semantically meaningful in that they should be expressed in terms of high level descriptors such as those provided by the MPEG7 standard. Motion trajectory is a very important cue for video content characterization, especially sports video. It can be used for content retrieval, video classification, video

2. MOTION TRAJECTORY EXTRACTION The proposed algorithm for motion trajectory extraction consists of four stages: (1) global motion estimation, (2) motion blob detection, (3) trajectory evolution and (4) trajectory refinement.

2.1 Global Motion Estimation

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’04, October 10-16, 2004, New York, New York, USA. Copyright 2004 ACM 1-58113-893-8/04/0010 ...$5.00.

Most sports videos, especially those of outdoor sports, contain significant global motion associated with movement of the camera. In order to extract accurate trajectories of objects while the camera is in motion, we need to estimate the global motion and compensate for it. Model based motion estimation has been reported extensively in literature

312

We use a statistical model based approach to blob detection which is motivated by a similar model used by Aach et al. for detecting changes in video [6]. The observed background image pixel is modelled to be affected by the addition of Gaussian noise, with zero mean and standard deviation σ, N (0, σ), i.e.,

[4]. In [5], the affine parameter estimation problem is formulated as a nonlinear minimization problem which is solved using an iterative algorithm. This method is semi-automatic because the user needs to identify at least 3 corresponding feature points in two frames. Our global motion estimation algorithm is an iterative algorithm with robust outlier rejection. The affine parameters are chosen so as to fit the block based motion vector between two frames which are available from the MPEG compressed video stream.We model the global motion as mvxi = p1 xi + p2 yi + p3 mvyi = p4 xi + p5 yi + p6

ˆ j) = I(i, j) + η I(i,

ˆ j) is where I(i, j) is the intensity at pixel position (i, j), I(i, the observed intensity and η is the Gaussian noise. Given the hypothesis H0 that the observed pixel belongs to the background, the temporal adjacent frame difference warp (i, j) is a random variable with dn (i, j) = Iˆn (i, j) − Iˆn−1 √ Gaussian distribution N (0, 2σ).

(1)

where mvxi and mvyi are the components of the motion vector for a particular macroblock (MB), xi and yi are the coordinates of the center of the MB and pi ’s are the affine parameters that we call motion vector affine parameters. The classical affine parameter model is given by

warp (x, y) = xn − xn−1 dn (x, y) = Iˆn (x, y) − Iˆn−1

(2) f (dn (i, j)|H0 ) =

The motion vector affine parameters of equation (1) is related to the classical affine parameters of equation (2) by m1 m4

m2 m5

m3 m6

  =

1 + p1 p4

p2 1 + p5

p3 p6



px = (C T C)−1 C T Vx py = (C T C)−1 C T Vy

(3)

2

(10)

If the probability P (∆2 ≤ ∆2 (i, j)|H0 ) < T , we declare that the pixel belongs to the background, else the pixel is set to be the moving pixel. Recall that in the camera motion estimation algorithm (described in the previous section), the iterative process successively rejected those MBs whose residual motion vectors (equation (5)) were greater than an adaptive threshold. An example motion block rejection map is shown in Figure 1. The statistical hypothesis testing algorithm described above is applied only on the rejected MBs in the BRM since it is observed that the moving objects lie predominantly in the rejected MBs. This also contributes to making the algorithm faster and to further improve its robustness. Instead of fully decoding all the MBs of the frame, only the rejected MBs are decoded. This saves both the processing and decoding time for each frame. From our experiment, on the average less than half of the MBs for each frame are decoded. It may be mentioned here that σ in equation (10) is estimated by using those pixels that do not lie in the rejected MBs

(4)

(5)

mvyi

where and are the estimated components of the motion vector for macroblock i. We propose an adaptive threshold mechanism to reject outliers in the residual motion vectors. The threshold T is decided by comparing the mean of the residual motion vectors over all MBs with a small constant α and choosing the maximum of the two, i.e.,T = max(mean(Rmvi ), α) T = max(mean(Rmvi ), α)

(9)

2

) Γ(w2 l/2, ∆2σ(i,j) 2 P (∆ ≤ ∆ (i, j)|H0 ) = Γ(w2 l/2) 2

After each iteration, we calculate the residual motion vector Rmvi as the absolute difference between the actual motion vector and the estimated motion vector, i.e., Rmvi = (mvxi − mvxi ) − (mvyi − mvyi )

1 d2 (i, j) √ exp{− n 2 } 4σ 2σ π

For a more robust detection, we consider a group of pixels belonging to the spatio-temporal window Ω ≡ w × w × l, where w is the spatial width and l is the  temporal length of the window. We define ∆ as ∆2 (i, j) = (k,l,n)∈Ω d2n (k, l), where Ω is the spatio-temporal window centered on (i, j, n) and containing w2 l pixels. If H0 is valid for all the pixels in the window Ω, the corresponding random variable ∆ is the summation of w2 l random variables whose conditional density with respect to hypothesis H0 follows a χ2 distribution[3], f ( √∆2σ |H0 ) = χ2 (w2 l − 1) The moving pixel is detected by evaluating the probability

We define a co-ordinate row vector ci for block i as ci = (xi , yi , 1). Next, the co-ordinate matrix C is formed by vertically concatenating the row vectors ci for all blocks which are not marked as outliers. C is, then, a N ×3 matrix, where N is the number of macroblocks not marked as outliers. The vectors Vx and Vy are formed by collecting all the mvxi and mvyi respectively, for the MBs not marked as outliers. Lastly, the motion vector affine parameters are grouped together as px = (p1 , p2 , p3 )T and py = (p4 , p5 , p6 )T . From these definitions, we can write Vx = Cpx and Vy = Cpy which are then solved for px and py using the pseudo-inverse matrix of C.

mvxi

(8)

Thus, the following conditional pdf holds:

xi = m1 xi + m2 yi + m3 yi = m4 xi + m5 yi + m6 .



(7)

(6)

The role of α is to prevent the rejection of a large number of motion vectors if the mean of the residuals is very small. We choose α to be equal to 0.5. The algorithm is initialized by labelling all macroblocks as inliers.



d2 (i,j)

regions} where d(i, j) is the using σ 2 = (i,j)∈{rejected 2N difference image between the first two frames and N is the number of pixels lying in the MBs that are not rejected. Finally, for the detected motion blobs, we merge two blobs if the distance between them is less than a threshold and remove blobs whose sizes are less than a threshold.

2.2 Motion Blob Detection Having determined the frame by frame global motion, we now describe our algorithm for detecting the motion blobs.

313

We use two values for ζc , one for the color components and the other for the center of mass components. The choice of Gaussian weights allows us to control the amount of update, especially in error-prone situations like occlusion of blobs.

2.4 Trajectory Refinement Trajectory refinement consists of removing spurious trajectories and smoothening the remaining ones. After removing trajectories whose lengths are less than a threshold, the remaining trajectories are deemed to be corrupted by white noise, and hence the refinement entails removal of noise. We propose to use a Kalman filter for noise removal since it provides optimal estimates of the system parameters, given measurements and knowledge of a system’s behaviour. The state vector is chosen to consist of position and the velocity of a blob while the measurement vector is chosen to be the velocity. Thus, the system can be defined by the following equations:

Figure 1: Block Rejection Map for example frame of ‘Soccer’ sequence.

2.3 Trajectory Evolution The detected motion blobs are used to develop the trajectory which evolves with each successive frame. Let Tu be a trajectory and Blv be a motion blob having center of mass at (xv , yv ). The color of blob Blv is represented by the 3-tuple (Rv , Gv , Bv ) calculated as the mean of the R, G and B components of all pixels belonging to the blob. For each trajectory, we formulate an adaptive trajectory description Du which consists of the mean color component (R, G, B) and the center of mass component (x, y) of the moving blobs. The description is initialized using the first blob’s mean color and center of mass. Let Dtu represent the current description for trajectory Tu . The center of mass and color component of Dtu are represented by (xut , xut ) and (Rtu , Gut , Btu ) respectively. We define the distance between blob Blv and trajectory Tu as the distance between Blv and Dtu given by

    

= (xv − xut )2 + (yv − ytu )2 d(Blv , Dtu )  +β (Rv − Rtu )2 + (Gv − Gut )2 + (Bv − Btu )2 (11) where β is a weight for the colour coherence part of the distance function and is chosen to be 0.1. The objective is to assign Blv to the trajectory that is closest to it in terms of the distance given by equation (11). The algorithm is initialized by assigning each blob as the beginning of a trajectory. For each successive frame, the distances of the blobs from each trajectory description is computed and stored in a matrix M of order U ×V , where U is the number of trajectories and V is the number of blobs. For each row, if the blob with the minimum distance is less than a threshold, then that blob is appended to the trajectory represented by that row. If the minimum distance is greater than the threshold, it means that the corresponding blob belongs to a new trajectory. If there are unassigned blobs remaining, they are marked as the beginning of a new trajectory. In this way, with each frame the trajectory evolves by comparison of the blobs in that frame with the trajectory description for each trajectory. When new blobs are appended to the trajectories, each component of the trajectory description is updated using the adaptive weighting equation described as

u (Mt−1 − Mv )2 1 exp{− } 2ζc2 2πζc

  =

1 0 0 0

0 1 0 0

1 0 1 0

1 0 0 0 0 1 0 0

0 1 0 1

 

 

   

xi−1 yi−1 dxi−1 dyi−1

xi yi dxi dyi



+ ηi



+ νi

(14)

3. EXPERIMENTAL RESULT In this section, we demonstrate the results of our algorithm for trajectory extraction in three sports videos, viz., badminton, soccer and tennis. The frame structure of the MPEG compressed videos is I1 B2 B3 P4 B5 B6 P7 . . ., out of which we process only the I and P frames. This will reduce the computational load by two thirds. The badminton sequence considered in this paper has no camera motion. Figure 2(a) & 2(b) show one of the frames from the sequence and its corresponding motion blobs detected by the proposed algorithm. The threshold for probability to decide whether a pixel belonged to the background or the foreground was set to 0.1. We consider only those blobs which have more than 150 pixels. If the Euclidean distance between two blobs is less than 10 pixels, they are merged. Due to lack of space, we are unable to illustrate the evolution of the trajectory. However, we should mention that in the initial part of the sequence, we encounter some spurious trajectories due to some amount of motion among the spectators. These spurious trajectories are removed and two smooth trajectories for 369 frames are generated after refinement by the Kalman filter, as shown in the top row of figure 2(c). Next, we consider 300 frames of the soccer sequence which is part of the MPEG7 test data set. In this sequence, the camera pans from left to right, while the players move from right to left. The detected motion blobs in frame 18 is shown in Figure 2(e). Generally the threshold on the probability is found empirically to lie in the range [0.0001 0.1]. The extracted trajectories are shown in figure 2(f). Since this sequence contains camera motion, the frames need to be warped to a reference frame, which we have taken to be the first frame in the sequence. Finally, we

(12)

where the subscript c stands for each component and Dv,c is the component of the newly appended blob to the current trajectory Dtu . The weight wc for each component is calculated as wc = √

xi yi

  

= 

where ηi and νi are white noise.



u u = (1 − wc ) · Dt−1,c + wc · Dv,c , Dt,c

xi yi dxi dyi

(13)

314

Smooth Trajectory List

20 40 60 80 100 120 140 160 180 200 220 240

(a)

50

100

150

(b)

200

250

300

(c) Smooth Trajectory List 50

100

150

200

250

300 −600

(d)

−500

−400

−300

−200

(e)

−100

0

100

200

(f) Smooth Trajectory List 50

100

150

200

250

300 50

(g)

100

150

200

(h)

250

300

350

400

(i)

Figure 2: (a,d,g) A sample frame, (b,e,h) the corresponding motion blobs detected in that frame and (c,f,i) the trajectories extracted from Top row: Badminton sequence, Middle row: Soccer sequence and Bottom row: Tennis sequence. 1

extract the trajectories from a tennis sequence of 312 frames. Here too, there is global motion due to the camera. The motion blobs detected in frame 89 and the trajectories in the entire sequence are shown in figure 2(h) and (i), respectively. All the result video clips are available from the website (http://www.ntu.edu.sg/home5/pg03763623/video/badminton.mpg, soccer.mpg, tennis.mpg). To demonstrate the efficiency of the proposed approach, Frame Number we calculate the decoded MB ratios(DM BRs), defined as the ratio of the number of MBs lying in the Block RejecFigure 3: Decoded MB ratio for the experiment setion Map(BRM) to the total number of MBs in the frame. quence: ‘soccer’, ‘tennis’, ‘badminton’. Figure 3 shows the DM BRs for the 3 video sequences used in the above experiment. The average DM BRs for ‘badminton’, ‘soccer’, ‘tennis’ sequences are 0.158, 0.404 and method is applied directly on the MPEG compressed video 0.481 respectively. The average DM BRs for ‘soccer’ and without fully decode of each frame. Extraction of trajecto‘tennis’ sequences are larger than for ‘badminton’ sequence. ries may not be an end in itself. For example, the features This is because the ‘soccer’ and ‘tennis’ sequence contain sigextracted from the trajectories can be used to classify sports nificant camera motion and more iterations are required durvideos and for analysis of a specific player during a particuing the global motion estimation, thus more blocks are relar game. The trajectories extracted in this paper are quite jected, while the camera remains stationary for ‘badminton’ promising to be put to use to achieve these goals. sequence, less iterations are required for computation of the camera motion and less blocks are rejected. Anyway, the 5. REFERENCES average DM BRs are less than 0.5, which means that less [1] A. Cavallaro, O. Steiger, and T. Ebrahimi. Multiple video than half of the MBs are fully decoded. object tracking in complex scenes. In Proc. of ACM Soccer Tennis Badminton

0.9

Decoded MB Ratio

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

4.

0

50

100

150

200

250

300

Multimedia 2002, pages 523–532, December 2002. [2] A. Monnet, A. Mittal, N. Paragios, and V. Ramesh. Back ground modeling and subtraction of dynamic scenes. In IEEE Intl. on ICCV, Nice, France, oct 2003. [3] S.G.Wilson. Digital Modulation and Coding. Prentice Hall, 1996. [4] C. Stiller and J. Konrad. Estimating motion in image sequences. IEEE Signal Processing Magazine, 16(4):70 –91, July 1999. [5] R. Szeliski. Video mosaics for virtual environments. IEEE Computer Graphics and Applications, 16(2):22–30, March 1996. [6] T.Aach, A.Kaup, and R.Mester. Statistical model-based change detection in moving video. Signal Processing, 31:165–180, 1993.

CONCLUSION

We have presented an approach for automatic extraction of trajectories in sports videos which works irrespective of whether the camera is still or is moving. The trajectories are extracted through a four stage process of (1) global motion estimation, (2) motion blob detection, (3) trajectory evolution and (4) trajectory refinement. Experimental results have demonstrated the effectiveness and efficiency of the proposed method. The method can extract an arbitrary number of trajectories for multiple objects. Moreover, the

315

Suggest Documents