Object Tracking in MPEG Compressed Video Using Mean ... - CiteSeerX

3 downloads 0 Views 485KB Size Report
Abstract. In the paper, we propose a new tracking scheme of an object in MPEG compressed domain. In the scheme, the motion flow for each macro block is ...
ICICS-PCM 2003 15-18 December 2003 Singapore

2A2.6 Object Tracking in MPEG Compressed Video Using Mean-Shift Algorithm Sung-Mo Park, and Joonwhoan Lee Department of Electronic and Telecommunication Engineering, Chonbuk National University, Korea [email protected], [email protected]

Abstract. In the paper, we propose a new tracking scheme of an object in MPEG compressed domain. In the scheme, the motion flow for each macro block is obtained from the motion vectors included in the MPEG video stream, and the simple camera operation is robustly estimated using generalized Hough transform. Then, the global camera operation is used to compensate the motion flow to determine the object motions. The residual motion flow after compensation is treated as a feature of moving objects for tracking. In the paper, we used mean-shift algorithm based on the residual motion flow rather than color information as Meer did in the uncompressed video. The experimental results show the validity of the proposed scheme.

1. Introduction Nowadays MPEG video is ubiquitous. The MPEG video stream contains lots of information involved in the encoding process. In the content-based video service, it is important to extract information from video in order to make index for retrieval or to interpret scene. This is the reason why there have been many researches focused on the direct utilization of the information without decoding [1][2]. The direct use of information included in MPEG video stream can save the time for extracting important video features for analysis as well as the time for decoding.

camera operation is used to compensate the motion flow to determine the object motions in a frame. The residual motion after compensation can be treated as 2dimensional features of a moving object for tracking.

MPEG4 is very nice features for content-based video services. But, some information is not still available because the object segmentation is not solved. Therefore, object-oriented encoding of rectangular shape in MPEG4 is similar to frame-based one in MPEG2. That is why we take MPEG2 domain rather than MPEG4 in the paper.

Mean-shift algorithm can be used for mode-seeking of a probability distribution in a greedy manner [4]. Meer has successfully used the algorithm for object tracking in uncompressed video [5]. In the tracking, the algorithm was used to seek the location of an object center, at which the spatial distribution of colors is as close as possible to that of initially designated object. Our scheme uses the motion flow obtained from MPEG compressed video rather than color. An object is initially designated with a bounding box that occupies several macro blocks through user interface. Then the histogram of motion flows of the object is constructed and the center of object in the successive frame is updated using the mean-shift algorithm.

We propose a new tracking scheme of an object in MPEG2-compressed domain. In the scheme, the motion flow for each macro block is obtained from the motion vectors included in the MPEG video stream, and the simple camera operation is robustly estimated using generalized Hough transform [3]. Then the global

The experimental results show the validity of the proposed scheme with some points. The object should be coarsely designated or tracked by the unit of a macro block rather than a pixel. Proposed tracking scheme shows that the motion vectors involved in MPEG2 video stream can be successfully utilized as a feature for

0-7803-8185-8/03/$17.00 © 2003 IEEE

tracking an object even if camera operation exists.

(d) I(t) + B(t+1) or P(t) + B(t+1)

2. Extraction of Motion Flows The motion vector in MPEG-video is different from optical flow because it is calculated and inserted for the purpose of encoding efficiency. Also, the frames are not arranged in sequential order in MPEG-2 video stream. Therefore, the order of frames should be rearranged and the motion vectors should be treated carefully to extract motion flows. Because the extracted motion flows still have lots of information irrelevant to the motions caused by camera and object movements, they should be properly filtered to remove unnecessary information. Again the combined motion appeared in a frame should be decomposed into components caused by camera operation and object movement. Fig. 1 shows the extraction process of object motion in the proposed scheme.

In the first one of case (a), we can take the reverse direction of a motion vector as a motion flow, and in the second one in which B-frames are inserted between two reference frames, we can take the opposite direction of a motion vector and divide it by the number of frames+1 to obtain a motion flow. In case (b), there are four possible arrangements according to the types of motion vectors, (forward, forward), (reverse, forward), (forward, reverse), and (reverse, reverse). One can refer [3] to obtain more information. Note that motion flow cannot be defined in the case of (forward, reverse). In case (c), the motion vector referenced by B-frame is directly treated as a motion flow, and in case (d), the opposite direction of motion vector of B-frame is taken.

2.2 Spatiotemporal Filtering M PEG 2 V ideo Stream

Construction of M otion Flow

Vector Sm oothing oothing Sm (V ector (vector M edian) M edian)

A nalysis of Cam era O peration ∆x, ∆y, α +

Extraction of O bject M otions

Fig. 1 Extraction Process of Object Motion

2.1 Construction of Motion Flow The motion vectors in MPEG video cannot be treated as the optical flows, because they are not arranged in sequential order and they have a different magnitude scale from frame to frame. Also, it is impossible to get sequential motion flow direct between compressed B frames [3]. In these reasons, we have to adjust motion vectors to have the same magnitude scale and arrange them in sequential order. According to the order of frame types, there are four possible cases. (a) I(t) + P(t+n) or P(t) + P(t+n) (b) B(t) + B(t+1) (c) B(t) + I(t+1) or B(t) + P(t+1)

(1)

Because motion vectors in MPEG2 are estimated by area-correlation method in order to enhance the compression efficiency, constructed motion flows from motion vectors have lots of information irrelevant to the motions caused by camera and object movements. They should be properly filtered to remove unnecessary information. For the purpose, we use a vector median filter in the spatial and temporal domain. This filter has a window of size 3*3*3 for the (x,y,t) domain. This filter works not only conserving the discontinuity of vectors but also removing the noisy vectors.

2.3 Estimation of Camera Operation and Extraction of Object Motion The motion flow reflects both camera operation and object movement in the consecutive frames. In order to extract motion flows due to object movement, it should be decomposed into two parts; one is motion flow caused by camera operation, and the other is that caused by object movements. Because camera operation globally contributes to the motion flow and the object motion influences only a

part that the object occupies, we can estimate it. In order to estimate camera operation, we should take robust estimator because the motion flow is mixed up with local object motions. In our scheme, we use generalized Hough transform [3]. Because 3D parameters involved in camera operation cannot be calculated from the information in MPEG2 video stream, this paper assumes 2D pan-tilt-zoom model induced from consecutive image coordinates. The parameters of pan-tilt-zoom model are ∆x0(the amount of pan or displacement in x-direction), ∆y0(the amount of tilt or displacement in x-direction) and α (the amount of zoom or magnification ratio. ∆x0 and ∆y0 are determined by the modes of the histogram of motion flows. α can be indirectly identified by the rate of increase or decrease in magnitudes of motion flows as the distance from zoom center is increasing. Fig. 2 shows the definitions of ∆x0, ∆y0, and α with their signs and the ideal distributions of the parameters according to their frequencies. In order to estimate the α, it is needed to predict the zoom center, because α is calculated by the distance from the zoom center. In this paper, zoom center is calculated by the method as same as in [3].

threshold in compensated motion flow.

(a)

(b)

Fig. 3. Objects to be traced after compensation

3. Object Tracking Using Mean-Shift Algorithm Mean-shift algorithm is successfully used for segmentation, clustering, and filtering problems [4]. The object tracking by means of mean-shift is originated from Meer [5], in which the color distribution in object region was used for the uncompressed video. Instead of the color, we use the distribution of motion flows. Object tracking using mean-shift algorithm can be achieved from the maximization process of Bhattacharyya similarity measure between two normalized histograms. The measure is defined by m

ρˆ ( y ) ≡ ρ [ pˆ ( y ), qˆ ] = ∑ pˆ u ( y )qˆ u

.

(3)

u =1

Fig. 2. Definition of parameters To extract motion flow due to pure object motions, the original motion flow in the last section should be compensated with the parameters ∆x0 , ∆y0 and α . In eq. (2), ( x0 , y 0 ) and ( ∆xcom , ∆y com ) represent zoom center and the compensated motion flow, respectively.

∆xcom = ∆x − ∆x0 − α ( x − x0 ) ∆y com = ∆y − ∆y 0 − α ( y − y 0 )

(2)

Fig. 3 shows the objects to be traced after compensation. Fig 3(a) is obtained from a frame during zoom-in operation and Fig. 3(b) from a frame during pan-tilt operation. In order to obtain the results, we take the macro blocks that have greater magnitude than a

ˆ u ( y ) and qˆ are the histogram of motion where p flows in candidate object region centered y and reference histogram in object region to be traced, ˆ ( y ), qˆ ] , we have to respectively. In order to maximize ρ[ p maximize

ρ~[ pˆ ( y ), qˆ ] ≡

Ch 2

nh

∑ wi k ( u =1

2

y − xi ). h

(4)

This can be iteratively achieved from mean shift using the shadow kernel of k () given

g ( x) = −k ' ( x) as

 yˆ 0 − xi 2  x w g ∑i =1 i i  h   . yˆ1 = 2  yˆ − x  n ∑i =h1 wi g  0 h i    nh

In the proposed object tracking, the object to be traced is designated by a bounding box enclosing several macro blocks, because the motion flow is assigned to a macro block rather than a pixel. That means qˆ is the histogram of motion flows for the macro blocks in the bounding box. Gaussian kernel exp(-x) is chosen to emphasize nearby macro blocks around object center Fig. 4 shows the proposed tracking scheme. In general, motion vectors cannot be defined in I-frame. During the spatiotemporal median filtering process, however, the motion flow is interpolated and filled up to the I-frame. Therefore, the algorithm can be applied to any frame types.

not possible if the camera operation is not compensated as the method proposed by Favall [6]. Table 1. Results of tracking for 4 test sequences

Test 1

Test 2

Test 3

Test 4

Object Ratio (Object/Mask)

1st I frame

2nd I Frame

3rd I Frame

4th I Frame

10/10 (100 %)

10/10 (100 %)

7/10 (70 %)

7/10 (70 %)

0 (0 %)

-3 (-30 %)

0 (0 %)

28/30 (93 %)

25/30 (83 %)

-1 (-3 %)

-3 (-10 %)

11/18 (61 %)

10/18 (55 %)

+1 (+6 %)

-1 (-6 %)

15/24 (62 %)

15/24 (62 %)

16/24 (66 %)

-4 (-17 %)

0 (0 %)

+1 (+6 %)

Error (Macro Block) Object Ratio (Object/Mask)

29/30 (96 %)

Error (Macro Block) Object Ratio (Object/Mask)

10/18 (55 %)

Error (Macro Block) Object Ratio (Object/Mask) Error (Macro Block)

19/24 (79 %)

Error Ratio : (Error Macro Block) / (Object Macro Block of Before I-Frame) +, - : Increase(Decrease) of Object Macro Block

Frame 90

Fig. 4. Proposed object tracking scheme

4. Experiments

Frame 103

The proposed scheme was implemented with Visual C++ in PC Windows. Table 1 shows the overall accuracy of the proposed tracking scheme for 4 MPEG-2 video sequences. Each MPEG2 sequence has 15 frames for a GOP. In the table, the ratio of object means the number of macro blocks occupied by an object in the bounding. If tracking is accurate, the ratio should be unchanged. We can conclude the tracking is successful during about 60 frames of a sequence.

Frame 113

Fig. 5 shows a sample result of Test 2 in Table 1, in which camera is panned and tracing a running man. It shows the proposed scheme can successfully trace an object independent of severe camera operation. This may

Fig. 5. Results of object tracking with camera operation

5. Conclusion In the paper, we propose a new tracking scheme of an

object in MPEG compressed domain. In the scheme, the motion flow for each macro block is obtained from the motion vectors included in the MPEG video stream, and the simple camera operation is robustly estimated using generalized Hough transform. Then, the global camera operation is used to compensate the motion flow to determine object motions. The proposed scheme uses mean-shift algorithm for tracking based only on the residual object motion. The proposed algorithm can successfully trace a moving object even if camera operation exists, because the camera operation is compensated. In general, meanshift algorithm is a greedy search method. Since object displacements in successive video frames are not large, mean-shift algorithm is efficient to the tracking purpose.

References [1] V. Kobla and D. Doermann, “Compressed domain video indexing techniques using DCT and motion vector information in MPEG video,” SPIE vol..3022, pp.200211, Jan. 1997 [2] R. Milanese and A. Jacot-Descombes, “Efficient Segmentation and Camera Motion Indexing of Compressed Video,” Real-Time Imaging, Vol. 5, No. 4, pp. 231-241, Aug 1999. [3] W. Y. Yoo and J. Lee,”Analysis of Camera Operations in Compressed Domain Based on Generalized Hough Transform,” PCM 2001, pp. 11021106, Oct. 2001. [4] Y. Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEE Trans. on PAMI, vol. 17, No. 8, Aug. 1995. [5] D. Comaniciu, V. Ramesh and P. Meer, “Real-Time Tracking of Non-Rigid Objects using Mean Shift,” 10636919/00 2000 IEEE. [6] L. Favalli, A. Mecocci and F. Moschetti ,"Object tracking for retrieval applications in MPEG-2," IEEE Trans. on C&S for VT, vol. 10, pp. 427-432, April 2000.