Automatic Generation of MPEG-7 Compliant XML Document for Motion Trajectory Descriptor in Sports Video Yi Haoran
Deepu Rajan
Chia Liang-Tien
School of Computer Engineering Nanyang Technological University Singapore 639798
School of Computer Engineering Nanyang Technological University Singapore 639798
School of Computer Engineering Nanyang Technological University Singapore 639798
[email protected]
[email protected]
[email protected]
ABSTRACT
General Terms
The MPEG-7 standard is a step towards standardizing the description of multimedia content so that quick and efficient identification of relevant content can be facilitated, together with efficient management of information. The description definition language (DDL) is a schema language to represent valid MPEG-7 descriptors and description schemes. MPEG7 instances are XML documents that conform to a particular MPEG-7 schema, as expressed in the DDL and that describe audiovisual content. In this paper, we pick one of the visual descriptors related to motion in a video sequence, viz., motion trajectory. It describes the displacements of objects in time, where an object is defined as a spatiotemporal region or set of spatiotemporal regions. We present a method of automatically extracting trajectories from video sequences and generating an XML document that conforms to the MPEG-7 schema. We use sports videos in particular, because the trajectories are very random and the robustness of our algorithm can be demonstrated. The MPEG-7 XM software is a working tool that enables the checking of the relative performance of various algorithms. Insofar as the motion trajectory is concerned, the current version of XM(version 5.5) takes the formatted key-point list of the already segmented object as inputs and outputs the trajectory as an XML document. However, we go a step further by developing an automatic trajectory extraction algorithm that produces motion blobs, which are then tracked to generate the trajectory, which in turn is output as an XML document.
Algorithms
Keywords MPEG-7, XML, motion descriptors, motion trajectory, motion blobs, Kalman filter
1. INTRODUCTION The increased availability and usage of digital video have created a need for automated video content analysis and multimedia database management techniques. The vast amount of the content information brings a great need for effective and efficient techniques of finding, accessing, filtering and managing video data. Search engines and DBMS systems exist for text information. However, videos are much more information-rich compared with text documents. The problem of identifying content is the key problem for a wide range of applications. The recent international standard MPEG-7 is aimed at addressing this problem. The MPEG-7 standard, formally known as “Multimedia Content Description Interface”, provides a set of “Descriptors” and “Description Schemes”, along with a “Description Definition Language”, to describe multimedia content[1]. The scope of the standard covers various forms of media including audio, video and images, both in digital and analog format, although most applications involve only digital content. MPEG-7 allows a standard description scheme of various aspects of multimedia material that can be used by MPEG-7 enabled applications aimed at end user, or automatic systems. Video has both spatial and temporal dimensions. The motion features of a video sequence provide the easiest access to its temporal dimension and are hence of key significance in video indexing. MPEG-7 standardize several motion descriptors. One of the descriptors is related to motion trajectory in which the displacement of objects in time are described, an object being defined as any spatiotemporal region or set of spatiotemporal regions whose trajectory is relevant in the context in which it is used [8]. However, an important aspect of MPEG-7 is that it standardizes content description but does not specify how the description is produced. In this paper, we describe an automatic trajectory extraction method for sports video and create an MPEG-7
Categories and Subject Descriptors I.4.8 [Computing Methodologies]: Image Processing and Computer Vision—Scene Analysis[Tracking]
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MMDB’03, November 7, 2003, New Orleans, Louisiana, USA. Copyright 2003 ACM 1-58113-726-5/03/0011 ...$5.00.
10
pressed video stream. In Section 3, the algorithms for detecting moving blobs, associating them with a trajectory and refining the final trajectories. In Section 4, the standardized format of outputting the MotionTrajectory descriptor as XML document are presented. Experimental results and conclusions are given in sections 5 and 6, respectively.
compliant XML document suitable for usage such as browsing and retrieval. There is a rich literature in computer vision for tracking of objects in image sequences. The goal of tracking is to follow video objects in the scene and to update their 2D shape from frame to frame [3]. Here, the object to be tracked is predefined by segmenting a frame of the image sequence. Most successful tracking algorithms are either region based or contour based. In the former, tracking relies on information provided by an entire region while the latter tracks only the contour of an object. Trajectory extraction, on the other hand, deals with determining the spatiotemporal positions of one or more moving objects in a sequence in terms of one representative point of the object(s). Unlike in tracking, where some property of a bounding box is searched for in subsequent frames, we view trajectory extraction as a process of assigning moving blobs to a dynamic list of trajectories that evolves as the video progresses. While object tracking is an end in itself, the trajectory extracted from a video sequence is used for content retrieval, video classification, video hyperlinking for rich interactive viewing etc. Thus, MPEG-7 provides Motion Trajectory descriptor to standardize the description of the trajectory. But the extraction of the motion trajectory by MPEG-7 XM software needs the formatted key-points lists of the moving objects as the input[2]. Trajectory extraction in sports video is a challenging problem because of the presence of global motion, multitude of trajectories and occlusion. In [12], object motion and trajectory are used for sports event detection where the trajectory is marked manually. In [4], the authors track the referee and players in a soccer video using the colour of their shirts; however, the position of the object to be tracked should be manually entered in the first frame in the form of a bounding box. Pingali et al. develop a real time tracking system for a tennis match [11]. They use two cameras, each camera capturing half of the court, and track a predefined player. Their algorithm tracks local features, like maximum curvature points, in a colour segmented region. Gueziec [6] develops a system for tracking the pitch in a baseball video. However, they do not consider the case of global motion since the camera is stationary. In [7], the authors use information from manually entered trajectories of the players to infer some semantic event in the game. In [13], the authors developed algorithms for segmentation and analysis of structures in soccer video. Our method works for both stationary and moving camera and can extract one or more trajectories of the multiple number of moving objects. Furthermore, we output the extracted trajectory list as the MPEG-7 compliant XML document for Motion Trajectory descriptors. The proposed method consists of following steps. We develop a new global motion estimation algorithm for global motion compensation and subsequently warp each frame to its previous frame. This completes the pre-processing for the actual extraction of the trajectory which consists of detecting motion blobs followed by an evolution of the trajectories of the individual blobs. A further refinement of the trajectory is carried out as a post processing step. Finally, the smoothed trajectory lists are outputted as the MPEG-7 compliant XML document. In the next section, we present our camera motion estimation algorithm using motion vectors form the MPEG com-
2. CAMERA MOTION ESTIMATION In [10], the MPEG bit stream was used to determine the motion vectors related to P- and B-frames. However, they use the angle of the motion vectors to characterize camera motion, which gives only a rough characterization of camera motion. Here, we present a simple yet effective camera motion estimation technique based on a six parameter affine model to determine the amount of pan, tilt and zoom motions of the camera. The frame-by-frame optical flow field can be approximated by the motion vectors (MV) in the MPEG stream [9]. For the I-frame, the motion vector is the inverse of the backward prediction MV of the previous B-frame. For the P-frame, the MV is the difference of the forward predicted MVs of the current and the previous frames. For the B-frame, it is the average of the difference of the forward predicted MV of the current and the previous frames and the difference of the backward predicted MV of the current and previous frames. Thus the MV for frame n is obtained as ,I −BKn−1 ,P F Wn − F Wn−1 M Vn = (F W − F W ,B n n−1 + BKn − BKn−1 )/2 (1) The MV field obtained is then smoothened using a spatial 3 × 3 × 1median filter followed by a temporal 1 × 1 × 3median filter. We model the camera motion as mvxi = p1 xi + p2 yi + p3 . mvyi = p4 xi + p5 yi + p6
(2)
where mvxi and mvyi are the components of the motion vector for a particular macroblock (MB), xi and yi are the coordinates of the center of the MB and pi ’s are the affine parameters that we call motion vector affine parameters. The classical affine parameter model is given by xi = m1 xi + m2 yi + m3 yi = m4 xi + m5 yi + m6
(3)
Since the motion vectors relate the centers of the MBs of two consecutive frames through xi = xi + mvxi yi = yi + mvyi ,
(4)
we can relate the motion vector affine parameters of equation (2) and the classical affine parameters of equation (3) as p2 p3 1 + p1 m1 m2 m3 = (5) m4 m5 m6 p4 1 + p5 p6 We define a co-ordinate row vector ci for block i as ci = (xi , yi , 1). Next, the co-ordinate matrix C is formed by vertically concatenating the row vectors ci for all blocks which are not marked as outliers. C is, then, a N ×3 matrix, where N is the number of macroblocks not marked as outliers. The vectors Vx and Vy are formed by collecting all the mvxi
11
global mv
global mv
3. TRAJECTORY EXTRACTION Having estimated the global motion in the video sequence, we now describe our algorithm for extracting the trajectory. Our trajectory extraction extraction approach consists of four steps: (1) detection of motion blobs, (2) trajectory evolution and (3) trajectory refinement, (4) trajectory output.
5
5
10
10
15
15
20
20
25
25
3.1 Motion Blob Detection
30
30
35
35
Our blob detection algorithm is developed along the same lines as in [5]. However, in our case, there is a pre-processing step involved, viz., if there is global motion, we use the motion vector based algorithm described in the previous section to determine the global motion and then warp one frame to its neighboring frame. In this way, we undo the effect of global motion. Next, the frame difference image between the two adjacent frames are thresholded to get a binary image. This is followed by the morphological closing and opening operation to form connected regions. Those regions that are smaller than a threshold are removed. At the same time, two regions are merged if their Euclidean distance is less than a threshold. Lastly, the centers of mass of the regions are transformed back to the co-ordinate system of the reference frame using equation(9), where the Mi ’s are the affine transformation matrix between frame i and i − 1. The algorithm is outlined in Algorithm 2.
5
10
15
20
25
30
35
40
5
10
15
20
25
30
35
40
Figure 1: Camera Motion optical Flow Field for Zooming and Panning of Tabletennis Shot and mvyi respectively, for the MBs not marked as outliers. Lastly, the motion vector affine parameters are grouped together as px = (p1 , p2 , p3 )T and py = (p4 , p5 , p6 )T . From these definitions, we can write Vx = Cpx and Vy = Cpy which are then solved for px and py as px = (C T C)−1 C T Vx py = (C T C)−1 C T Vy
(6)
After each iteration, we calculate the residual motion vector Rmvi as the absolute difference between the actual motion vector and the estimated motion vector, i.e., Rmvi = (mvxi − mvxi ) − (mvyi − mvyi )
M (am , am+n ) =
(7)
Mi
(9)
i=m
where mvxi and mvyi are the estimated components of the motion vector for macroblock i. We propose an adaptive threshold mechanism to reject outliers in the residual motion vectors. The threshold T is decided by comparing the mean of the residual motion vectors over all MBs with a small constant α and choosing the maximum of the two, i.e., T = max(mean(Rmvi ), α)
i=m+n
Algorithm 2 Motion Blob Detection Input: Two video frames: Fm and Fm+1 Output: Center of mass of motion blob 1: if there is camera motion then 2: use algorithm 1 to estimate the 6 motion vector affine paramters pi . 3: Warp frame F 1 to F 2 and subtract F 2 from the warped F 1 to get the difference image I. 4: else 5: Subtract F 2 and F 1 to get the difference image I. 6: end if 7: Threshold I to get a binary image. 8: Apply morphological close and open on the binary image to form connected regions. 9: Merge two regions if the distance between them is less than a threshold and remove regions whose sizes are less than a threshold. 10: Find the centers of mass for each region. 11: if there is camera motion then 12: Transform the centers of mass to coordinate system of the reference frame. 13: end if
(8)
The role of α is to prevent the rejection of a large number of motion vectors if the mean of the residuals is very small. We choose α to be equal to 0.5. The iterative algorithm for global motion estimation is shown in Algorithm 1. Algorithm 1 Motion Vector (MV) Based Camera Motion Estimation Input: Smoothed Motion Vector Field Output: Affine Transformation Parameters pi (i = 1 . . . 6) 1: Mark all the MV as ‘inliers’. 2: Estimate the motion vector affine parameters for ‘inliers’ (equation (6)). 3: Calculate the residual motion vector (equation (7)). 4: If the residual motion vector is greater than a threshold (equation (8)), then mark this MB as ‘outliers’. 5: Go to step 3, until there are no more new ‘outliers’ or more than half the MBs are rejected 6: If there is more than half the MBs are rejected, the calculated parameters are set to zeros.
3.2 Trajectory Evolution The detected motion blobs are used to develop the trajectory which evolves with each successive frame. We now describe the proposed method for trajectory evolution. Let Tu be a trajectory and Blv be a motion blob having center of mass at (xv , yv ). The colour of blob Blv is represented by the 3-tuple (Rv , Gv , Bv ) calculated as the mean of the R, G and B components of each pixel belonging to the
The estimated optical flow field for zoom and pan in the Table Tennis sequence is shown in Figure 1.
12
blob. Let Btu represent the last blob in trajectory Tu . The center of mass and colour of Btu are represented by (xut , ytu ) and (Rtu , Gut , Btu ) respectively. We define the distance between blob Blv and trajectory Tu as the distance between Blv and Btu given by
system can be defined 1 xi yi 0 = dxi 0 0 dyi
d(Blv , Btu ) = (xv − xut )2 + (yv − ytu )2 +β (Rv − Rtu )2 + (Gv − Gut )2 + (Bv − Btu )2 (10) where β is a weight for the colour coherence part of the distance function and is chosen to be 0.1. The objective is to assign Blv to the trajectory that is closest to it in terms of the distance given by equation (10). The algorithm is initialized by assigning each blob as the beginning of a trajectory. For each successive frame, the distances of the blobs from the last blob in each trajectory is computed and stored in a matrix D of order U ×V , where U is the number of trajectories and V is the number of blobs. For each row, if the blob with the minimum distance is less than a threshold, then that blob is appended to the trajectory represented by that row. If the minimum distance is greater than the threshold, it means that the corresponding blob belongs to a new trajectory. If there are unassigned blobs remaining, they are marked as the beginning of a new trajectory. In this way, with each frame the trajectory evolves by comparison of the blobs in that frame with the last blob in each trajectory. The algorithm for the proposed trajectory evolution technique is summarized in Algorithm 3.
xi xi 1 0 0 0 y i + νi = 0 1 0 0 dxi yi dyi
(11)
where ηi and νi are white noise. The Kalman filter yields an optimal estimate of the trajectory while removing noise.
4. XML DOCUMENT FOR MOTION TRAJECTORY DESCRIPTOR The extracted motion trajectory information consists of several lists of points. This information could be stored in various ways. However, it will not allow search across different repositories and doesn’t facilitate content exchange issues using different description methods. These are interpretability issues, and creating a standard description is an appropriate way to address them. XML(Extensible Markup Language) is a simple, very flexible text format derived from SGML(ISO 8879). It is powerful and widely used to describe and exchange data. XML schema has been widely used as a schema language for constraining the structure and content of XML document. After a detailed evaluation of XML and XML schema, MPEG-7 chose to adopt and extend XML and XML schema as the description definition language(DDL) for MPEG-7 document. Thus the MPEG-7 documents are XML documents that conform to a particular MPEG-7 schema(expressed in DDL) and that describe audiovisual content. MPEG-7 schema defines a syntax and structure for motion trajectory representation, which is called “Motiontrajectory” descriptor. Therefore, the extracted trajectory information shall be represented as the “Motiontrajectory” descriptor in XML format. The motion trajectory records the moving path of a specific object. It is a high-level feature associated with a moving region. As mentioned before, it is defined as a series of spatial-temporal localization of one of its representative points. Therefore, the core information of trajectory is a list of points, which has both spatial and temporal locations. In order to describe the spatial localization of the points, the spatial coordinate system must be specified. Therefore, the motion trajectory descriptor consists of 2 sub elements and one attribute. One element of the Motiontrajectory descriptor defines the coordinate system. The system can be defined as a new one or refer to an already existing one. This is differentiated by using CoordRef or CoordDef. MPEG-7 supports two kinds of 2D spatial coordinate systems: local and integrated. In a local coordinate system, the coordinates in the descriptor coordinate system are mapped to the current image. For a video sequence, each frame uses its own coordinate system. In an integrated system, each image(frame) of a video sequence is mapped to the coordinate system specified by the first frame. The local system is used to specify the coordinated system of the first frame. The integrated coordinate system is used to represent the coordinates on a mosaic. SpatialRef attribute of the coordinate system element CoordRef indicates which type of spatial coordinate
Algorithm 3 Trajectory Evolution Input: Motion Blobs Bl and Trajectories Output: Trajectories 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11:
by the following equations : xi−1 0 1 0 1 0 1 yi−1 + ηi 0 1 0 dxi−1 0 0 1 dyi−1
Initialization : Assign each blob to a trajectory. Calculate the distance matrix D using equation (10) for each row u of D do m = min(D(u, v)) if m ≤ T hreshold then Append motion blob Blv to trajectory Tu . end if end for for all the remaining unassigned blobs do create a new trajectory end for
3.3 Trajectory Refinement Trajectory refinement consists of removing spurious trajectories and smoothening the remaining trjectories. If the length of the trajectories extracted from Algorithm 3 is less than a threshold, they are considered to be spurious and hence removed. The remaining trajectories are deemed to be corrupted by white noise, and hence the refinement entails removal of noise. We propose to use a Kalman filter for noise removal since it provides optimal estimates of the system parameters, given measurements and knowledge of a system’s behaviour. The task then is to determine the state vector and the measurement vector. The state vector is chosen to consist of position and the first derivative of the position (that is the velocity) of a blob while the measurement vector is also chosen to be the velocity. Thus, the
13
2 112 18 21 ...... 366 369 166 146 167 145 ...... 197 116 196 126 firstOrder
Figure 2: Example of Motion Trajectory Descriptor trajectory extraction process, we only process reference frames (I,P frames). This will reduce the computational load by two thirds. As noted earlier, the trajectory extraction part consists of motion blob detection, trajectory evolution and trajectory refinement. Figure 3 shows the motion blobs detected from a sample of the frames in the badminton sequence. The threshold for generating the binary image was set to 20 while the structuring element for the morphological operation was a disk of radius 3. We consider only those blobs which have more than 150 pixels. If the Euclidean distance between two blobs is less than 10 pixels, they are merged. The threshold for m (see step 5 of algorithm 3) was chosen to be 30 in order to be consistent with the MPEG standards which assume that motion between consecutive frames do not normally exceed 30 pixels. Figures 4 (a) and (b) show the evolution of the trajectories from frames 1 to 123 and from frames 1 to 246 respectively. We see some spurious trajectories due to some amount of motion among the spectators. These spurious trajectories are removed and a smoothened trajectory for 369 frames is generated after refinement by the Kalman filter, as shown in Figure 4 (c). The output MPEG-7 compliant XML document of the trajectory is shown in Figure 5. MediaLocator indicates the source of the video. We use the DescriptionCollectionType, which is defined in MPEG-7 Multimedia Description Schema as tools for collection of MPEG-7 descriptors, to encapsulate the Motion Trajectory Descriptors. This XML document consists of two Motion Trajectory Descriptors, each corresponds to one player’s trajectory in the badminton match. The KeyTimePoints and KeyValuePoints of each MotionTrajectroy descriptor are filled by the extracted trajectory lists. Next, we consider 300 frames of the soccer sequence which is part of the MPEG-7 test data set. The detected motion blobs are shown for a sample of the frames in Figure 6. Only those blobs greater than 50 pixels are retained. One player in the second column and two players in the fourth column have not been detected because their motions are not significant. The evolution of the trajectories is illustrated in
system is used;SpatialRef set to 0 implies the integrated system while spatialRef set to 1 implies the local system. The other element Params specifies the spatio-temporal keypoints and interpolation function used to express the trajectory list. The over all time interval on which the trajectory is defined can be specified either by a constant time interval or by a list of KeyTimePoints. Dimension specifies its dimension(2D or 3D). NumOfKeypoints defines the number of the key points used to defined the trajectory. KeyTimePoint and KeyValues define the temporal and spatial location of the key points. The IntepolationFunction specifies the type of interpolation function used between key points. The extracted trajectory lists are used to fill the KeyTimePoint and KeyValues fields. The CameraFollows attribute of the MotionTrajectory indicates whether the camera follows the moving object. An example of the Motion trajectory descriptor is shown in Figure 2. The camera follows the moving object (CameraFollows=1). The descriptor uses a coordinated system name as ID1, which is defined elsewhere. An integrated coordinated system is indicated by (SpatialRef=0). The spatial dimension of the trajectory is 2(Dimension=2) and its length is 112 (NumOfKeypoints=112). The KeyTimePointList defines the temporal location of the keypoints in the trajectory list, which contains 112 TimePoints. The KeyValueList contains the spatial location of the keypoints in the trajectory list, which contains 112 KeyValue. The fact that first order interpolation is used between the key points is shown by IntepolationFunction=’firstorder’. The KeyTimePointList and KeyTimeValueList are filled by the extracted corresponding trajectory lists to output the standardized XML document for motion trajectory description.
5.
RESULTS
In this section, we demonstrate the results of our algorithm for trajectory extraction. In our experiment, the videos are compressed by MPEG with the following frame structure I1 B2 B3 P4 B5 B6 P7 . . .. In order to speed up the
14
50
50
50
50
100
100
100
100
150
150
150
150
200
200
50
100
150
200
250
300
200
50
100
150
200
250
300
200
50
100
150
200
250
300
50
50
50
50
100
100
100
100
150
150
150
150
200
200
50
100
150
200
250
300
200
50
100
150
200
250
300
50
100
150
200
250
300
50
100
150
200
250
300
200
50
100
150
200
250
300
Figure 3: Motion blob detection in sample frames of the badminton sequence
20 20
20
40
40
40
60
60
60
80
80
100
100
120
120
140
140
160
160
180
180
200
200
220
220
80 100 120 140 160
240
50
100
150
200
250
300
240
180 200 220 240 50
100
150
(a)
200
250
300
50
(b)
100
150
200
250
300
(c)
Figure 4: Trajectories extracted from (a) frames 1 to 123 and (b) frames 1 to 246 in the badminton sequence (c) Final trajectory extracted after refinement from 369 frames badmintion.mpg 2 112 18 21 ...... 366 369 166 146 167 145 ...... 197 116 196 126 firstOrder ......
Figure 5: Motion trajectory descriptors for ”Badminton video” in MPEG-7 compliant XML format
15
20 40 50
50
50
100
100
100
60 80 100 120 140 150
150
150
200
200
200
160 180 200 220
50
100
150
200
250
50
100
150
200
250
50
100
150
200
250
50
100
150
200
50
100
150
200
250
20 40 50
50
50
100
100
100
60 80 100 120 140 150
150
150
200
200
200
160 180 200 220
50
100
150
200
250
50
100
150
200
250
50
100
150
200
250
250
Figure 6: Motion blob detection in sample frames of the soccer sequence
0
0
0
50
50
50
100
100
100
150
150
150
200
200
200
250
250
−600
−500
−400
−300
−200
(a)
−100
0
100
200
300
−600
250
−500
−400
−300
−200
−100
0
100
200
300
−600
−500
−400
−300
(b)
−200
−100
0
100
200
300
(c)
Figure 7: Trajectories extracted from (a) frames 1 to 100 and (b) frames 1 to 200 in the soccer sequence (c) Final trajectory extracted after refinement from 300 frames soccer.mpg 2 87 3 ...... 243 91 127 ...... -360 110 firstOrder ... ...... ...
Figure 8: Motion trajectory descriptors for ”Soccer video” in MPEG-7 compliant XML format
16
7. REFERENCES
Figure 7 where (a) and (b) are the trajectories extracted from frames 1 to 100 and from frames 1 to 200, respectively. If the number of blobs in a trajectory is less than 10, it is rejected. After refinement by Kalman filter, the final trajectory is shown in Figure 7 (c). The output MPEG-7 compliant XML document of the trajectory is shown in Figure 8. Same as in the badminton video, we use the MediaLocater to indicate the source of the video and the DescriptionCollectionType as the container for the trajectory descriptors. This document consists of 21 MotionTrajectory for the players and the referees. Same as before, the KeyTimePoints and KeyValuePoints of those MotionTrajectroy descriptors are filled by the extracted 21 trajectory lists.
6.
[1] Introduction to mpeg-7. ISO/IEC JTC1/SC29/WG11/N3545, July 2000. [2] Multimedia content description interface-part 8: extraction and use of mpeg-7 descriptors. ISO/IEC 15938-8:2002, 2002. [3] A. Cavallaro, O. Steiger, and T. Ebrahimi. Multiple video object tracking in complex scenes. In Proc. of ACM Multimedia 2002, pages 523–532, December 2002. [4] A. Ekin and A. Tekalp. A framework for analysis and tracking of soccer video. In Visual Com. and Image Proc. (VCIP), San Jose, CA, Jan 2002. [5] R. C. Gonzalez and R. E. Woods. Digital Image Processing (2nd Edition). Addison-Wesley, 1992. [6] A. Gueziec. Tracking pitches for broadcast television. IEEE Computer, 35(3):38–43, March 2002. [7] S. Intille and A. Bobick. Recognizing planned, multi-person action. Computer Vision and Image Understanding, 81(3):414–445, March 2001. [8] B. Manjunath, P. Salembier, and T. Sikora. Introduction to MPEG-7 : multimedia content description interface. New York : Chichester : Wiley, New York, NY, USA, 2002. [9] R. Milanese, F. Deguillaume, and A. Jacot-Descombes. Video segmentation and camera motion characterization using compressed data. Multimedia Storage and Archiving Systems II(SPIE Proceedings), 3229, 1997. [10] N.V.Patel and I.K.Sethi. Video shot detection and characterization for video databases. Pattern Recognition, 30:607–625, Apr. 1997. [11] G. Pingali, Y. Jean, and I. Carlbom. Real time tracking for enhanced tennis broadcasts. In Proc. of the IEEE CVPR, pages 260–265, 1998. [12] V. Tovinkere and R. Qian. Detection of semantic events in soccer games: towards a complete solution. IEEE Intl. on Mult. and Expo (ICME), August 2001. [13] P. Xu, L. Xie, S.-F. Chang, A. Divakaran, A. Vetro, and H. Sun. Algorithms and system for segmentation and structure analysis in soccer video. In IEEE Conf. on Mult. and Expo, Tokyo, August 2001.
CONCLUSION
A framework for automatic extraction of trajectories in sports videos has been presented. Global motion estimation helps to undo the effect of camera motion. The trajectories are extracted through a four stage process of motion blob detection, trajectory evolution, trajectory refinement and standardized output. Experimental results have demonstrated the effectiveness of our algorithm. The method can extract an arbitrary number of trajectories for multiple objects. Moreover, the global motion estimation method is used directly on an MPEG compressed video, which avoids the heavy computation of motion filed estimation. And with our effective outlier rejection during each iteration, the estimated global motion parameters are satisfactory. Finally, the motion trajectories are output as MPEG-7 descriptors in an XML document. This XML document can then be put into use for browsing and sports video retrieval applications. Trajectory extraction is not an end in itself. We propose to use features in the trajectories to classify sports videos. Clearly, at the top level where one wants to classify indoor and outdoor sports videos, trajectories yield very distinguishing features. One can use other features extracted from the trajectory like crossover points to determine the nature of the sports further. The features could also be used for analysis of a specific player during a particular game. Some semantic information can be extracted from the trajectory combined with the rules of the type of sports, for example “block” and “touch down” can be inferred in American football game by analyzing the players’s trajectories.
17