automatic moving object extraction in mpeg video

0 downloads 0 Views 579KB Size Report
variance is employed to extract the final object mask by ... object extraction in compressed video. 1. ... the inter-frame difference on the reduced DC images. The.
AUTOMATIC MOVING OBJECT EXTRACTION IN MPEG VIDEO Wei Zeng1 1

Wen Gao1,2,3 Debin Zhao1, 3

(Department of Computer Science and Technology, Harbin Institute of Technology, China) 2 (Institute of Computing Technology, Chinese Academy of Sciences, China) 3 (Graduate School of Chinese Academy of Sciences, China) Email:{wzeng, wgao,dbzhao}@jdl.ac.cn ABSTRACT

In this paper, we propose a moving object extraction technique for MPEG coded data directly. It is a changebased motion object extraction approach, which discriminates background and moving objects by means of the higher-order statistics (HOS) performed on the interframe differences of DC image. The DC image is partly decoded picture from the compressed video for the rapid reconstruction of image data. In order to employ an optimal threshold in moving object detection stage, the background is detected by the Moment-preserving thresholding technique for each frame. Based on the background statistic, the proportion of background variance is employed to extract the final object mask by comparison the fourth moment measure and the variance. Experimental results have demonstrated that the proposed approach worked efficiently and shown a robust result for object extraction in compressed video.

1. INTRODUCTION The new multimedia description scheme envisaged by the MPEG-7 standard will offer content-based functionalities demanding the sophisticated representation of visual information with regions and objects. The separate multimedia representation enriches the means of accessing multimedia content and provides flexible multimedia manipulations. Since segmentation is a fundamental tool for making the region-based or object-based representation of multimedia, it is urgent to develop efficient and robust segmentation techniques for multimedia process. Usually, video object segmentation is performed on uncompressed video in pursuing high segmentation accuracy. However, in light of the proliferate usage of compressed videos, such as MPEG-1 and MPEG-2 streams, it increasingly makes video analysis in the compressed domain more attractive.

0-7803-7762-1/03/$17.00 ©2003 IEEE

Recently, the direct extraction object from MPEG data has been discussed by several researches. The spatiotemporal coherency is used to segment object with homogenous motion or other image features. Y. Nakajima et al. [1] compute the angle between two adjacent motion vectors to evaluating the spatio-temporal correlation of P and B frames. They also adopt twenty block DCT coefficients as confidence measure of motion vector. At last, the object is extracted as the regions with unified motion vector and high confidence. Similarly, H. Zen et al. [2] segment object according to motion vector’s magnitude and angle difference in the static background. R. Wang et al. [3] set up a moving object segmentation system considering spatial, temporal and texture confidence measure for each macroblock in the compressed video. On the other hand, A. Benzougar et al. [4] propose an MRFbased moving object extraction paradigm. They exploit the motion compensated differences of macroblock under the Markovian labeling framework. In this paper, we propose a novel change-based moving object extraction technique in MPEG coded video. The basic idea of our approach is detection moving object from the inter-frame difference on the reduced DC images. The DC image is constructed by partly decoding the video frame. After the DC image reconstruction, an adaptive detection scheme of the moving objects is performed on the DC inter-frame difference after background detection. The background is discovered by the Moment-preserving thresholding method first. Then, the forth-order moment detector is evaluated on a moving window of the interframe difference block by block. The blocks with the moment above the threshold which is computed from the background statistic are considered as the moving part finally. The noise blocks are removed by filtering process at last. The remainder is organized as follows. In Section 2, we describe our framework of the proposed method and give the detailed algorithm description. Experiments are reported in Section 3. Concluding remarks are given in Section 4 and acknowledge is in Section 5.

II-524

forward and backward prediction frame. The detailed discussion can be found in [5].

2. FRAMEWORK and COMPONENT DESIGN 2.1 Overall Flow

2.3 Background Detection

Our algorithm is composed of DC image production, background detection and inter-frame difference change detection. The DC image is decoded from the incoming compressed video stream and buffered in memory for computing the frame difference first. After the DC image construction, the inter-frame difference is calcula ted between the current frame and the buffered frame. Secondly, the background detection is accomplished by the Moment-preserving thresholding technique [6]. Finally, the block is classified by the local fourth-moment statistical measure in a small moving window. The framework of our approach is illustrated by figure 1:

Background detection is a core step in our change-based moving object extraction paradigm. Because of camera noise and environmental light variation, the signal of background in the inter-frame difference is noisy. The background noise, which can be modeled as the Gaussian distribution, will influence the extraction of moving object deeply. Therefore, reliable background detection is important for moving object extraction. A. Neri et al. [6] compute the noise variance of the background data from the chosen video set. Their method adopts the global statistics of background signal without considering the local variance of current frame. From this point, we calculate the background noise statistic in each frame after the background detection. To determine the background in the inter-frame difference, an optimal threshold should be chosen for judgment. Tsai’s Moment-preserving method is a good thresholding technique, which provides the uniformity and good object shape in image segmentation [7]. The thresholding approach is suitable for background segmentation since the histogram of inter-frame difference always has one peak. Figure 2 shows two histogram examples.

Fig. 1. The framework of moving object extraction 2.2 DC Image Production The DC image is a partially decoded picture of video. It is constructed from the DC coefficient of each block in MPEG data. The DC image pixel intensity is the average of 8 by 8 block and its size is 1/64 of fully decoded picture . In most case, MPEG video is coded with three types of frame: I frame, P frame and B frame. The I frame is intra-coded frame which is formatted as JPEG image. Therefore , the DC image pixel can be gotten from the block DC coefficient directly. Nevertheless, the P and B frame are motion compensated frame in which the block is coded with one or two motion vectors. The construction of DC image from P or B frame should compensate motion first. The production of DC image from P frame is as follows 4

hj w j

j =1

64

DCT ( P) 00 = ∑

DCT ( Pj ) 00 ,

(1)

(a) (b) Fig 2. Histograms of inter-frame difference The threshold calculated by the moment-preserving method is obtained from the histogram by choosing threshold t as the p-tile [8], where p is given by

p=

where the h j and w j is the block location in the reference block. Because the reconstruction block can locate in the joint part of the four neighbor blocks in the reference frame, the new DC coefficient is approximated by the linear sum of reference block DC coefficients with the size as the weight. The DC image construction of B frame is similar to P frame. If block of B frame has only one motion vector, the DC image reconstruction is same as P frame. Otherwise, the DC image of B frame will be compensated twice by the

z − m1 , (c − 4c0 )1 / 2

(2)

2 1

where

c0 =

m1 m3 − m22 , m2 − m12

z= and

II-525

[

c1 =

m1 m2 − m3 , m 2 − m12

]

1 2 (c1 − 4c 0 ) 1 / 2 − c1 , 2

mi =

1 l −1 ∑ gh( g ). n g =0

block moment is above threshold cσ t2 , the block is marked

i = 1,2,3

as a moving block.

The g is grey level and h(g ) is the probability of concurrency of grey level g. The mi is the ith order moment.

2.5 Post Processing

The background is determined by where the inter-frame difference below the threshold that defined by E.q. (2).

Although the object mask is generated from the object detection step, there are some noise regions in the initial extracted object mask. A post-processing step is needed to eliminate these noise regions. The minimum size constraint or morphological operatio ns are efficient technique for noise removing process.

2.4 Inter-frame Change Detection The inter-frame difference is the subtraction operation result at the image spatial grids. The moving object in the inter-frame difference will occur as high value, while background will represent as the lower value respectively. In theory, the object and background induced different statistics signal distribution. The background signal is Gaussian distribution, while the moving object of the interframe difference are strong deviate from Gaussianity and can be considered as non -Gaussian signal. Change based motion detection is a traditional and very efficient technique for distinguishing the signal statistic difference between moving object and background. The background signal can be detected by second order statistic detector and the moving object can be detected by the fourth-order moment measure. Fourth-order moment detector of non-Gaussian signal is performed by computing fourth-order moment for each site ( x0 , y 0 ) within a moving window of size N. The formulation of the fourth-order detector is:

m4 ( x0 , y0 ) =

1 N

∑ (diff (x, y ) − diff ( x , y 0

( x , y )∈W ( x 0 , y 0 )

0

))4 , (3)

where the diff ( x, y) is the inter-frame difference

diff ( x, y) = I t ( x, y) − I t −1 ( x, y) , (4) and diff is the sample mean in the moving window W ( x0 , y0) with size N:

diff ( x0 , y0 ) =

1 N

∑ diff ( x, y ) .

(5)

( x, y )∈W ( x0 , y0 )

At last, the moving object extraction process becomes comparing the forth-moment and the given threshold block by block. The threshold is the background signal variance with a scalar. The moving block is detected by

 object m 4 (x 0 , y 0 ) > cσ t2 , (6) block ( x0 , y 0 , t ) =  otherwise background where the σ t2 is the background signal variance. The constant c is an ad-hoc measure by the statistic of videos, which represents the activity of related video. In our practice, the c equal to 81 has the best performance. If the

3. EXPERIMENTAL RESULTS Several real examples are applied to evaluate the proposed approach. Typical video sequences from the MPEG-7 content set have been used. The original video sequences are compressed by the MPEG-1 encoder according to the frame pattern IBBPBBPBBPBB. Three surveillance sequences are chosen for experiment. Figure 3 shows one frame of the tested videos. The first two videos are outdoor sequence (ETRI_A and ETRI_B) with different camera distance. The third sequence is the SPEEDWAY sequence which is a surveillance video in a speedway. Figure 4 shows the segmented object samples of sequence ETRI_B and Speedway. The left picture is the original frame and the right picture is the extracted object. The small binary image is the block-based object mask. In Figure 5, series results of object extraction are shown. It is clear seen that the moving object is correctly extracted from the sequence ETRI_A. Because the occlusion around object boundary, the extract object mask is bigger than real object region. Therefore the object contour cannot be obtained from block-based segmentation, the accurate contours can be filtered out by pixel-based process after decoding the extracted object. The proposed approach can successfully extract the moving regions that can be referred as moving object or motion blob. It is very useful for motion analysis without considering the object shape; moreover, this pre-extraction method offers a good initialization for future accurate object segmentation. 4. CONCLUSION In this paper, we propose a change-based moving object extraction approach performed on DC sequence in the compressed video. The moving object extraction is accomplished by the fourth-moment detector after the automatic background detection. As our method only exploits the MPEG DC image partly decoded from the coded data, the developed algorithm works in real time. The experimental results demonstrate that our technique can extract moving object efficiently and successfully.

II-526

Comparison with other object extraction techniques in compressed domain, the proposed method can be applied to the videos with very noisy motion vectors. In such situation, the estimated motion vectors are not reliable and there are many intra -coded block in MPEG video. 5. ACKNOWLEDGEMENT

Frame No.122 of ETRI_B

extracted Object

mask

This work has been supported by National Hi-Tech Development Programs of China under grant No. 2001AA142140. 6. REFERENCES [1] Y.Nakajima, A.Yoneyama, H.Yanagihara, and M.Sungano, “Moving Object Detection from MPEG Coded Data,” In SPIE Visual Communications and Image Processing, San Jose, Vol.3309, pp.988-996, Jan. 1998. [2] H.Zen, T.Hasegawa, and S.Ozawa, “Moving Object detection from MPEG coded picture,” In Proceeding of IEEE International Conference on Image Processing, Vol. IV, pp.25-29, Oct.1999. [3] R.Wang, H.J.Zhang and Y.Q.Zhang, “Compressed domain moving object extraction,”In the proceeding of International Symposium on Circuits and Systems, Geneva Switzerland, May 2000. [4] A.Benzougar, P.Bouthemy and R.Fablet, “MRF-based moving object detection from MPEG coded video,”IEEE Int. Conf. on Image Processing (ICIP2001), Thessalonique Greece, pp. 402-405, Oct. 2001. [5] B.L.Yeo and B.Liu, “On the extraction of DC sequences from MPEG compressed video,”In Proc. IEEE Int. Conf. on Image Processing, Vol. II, pp. 260-263, 1995 . [6] A.Neri, S.Colonnese, G.Russo and P.Talone, “Automatic moving object and background separation,” Signal Processing, Vol. 66, pp 219-232, 1998. [7] P.K.Sahoo, S.Soltani, and A.K.C.Wong, “A survey of thresholding techniques,” Comput. Vision, Graphics, and Image Process., Vol. 41, pp.233-260, 1988. [8] W.Tsai, “Moment-presevering thresholding: A new approach,”Comput. Vision, Graphics, and Image Process., Vol. 29, pp. 377-393, 1985.

(a)

(b)

Frame No.156 of SPEEDWAY extracted object mask Fig. 4. Extracted object samples from the sequence

Frame No.122

extracted object

Frame No.161

extracted object

Frame No.173

extracted object

Frame No.227

extracted object

(c)

(a) Key frame of ETRI_A.mpg (b) Key frame of ETRI_B.mpg (c) Key frame of Speedway1.mpg

Fig. 5. Series result s of object extraction for ETRI_A.mpg

Fig. 3. Original images corresponding to the compressed videos used in the experiments.

II-527

Suggest Documents