fast video object segmentation using affine motion and ... - CiteSeerX

2 downloads 0 Views 205KB Size Report
Abstract - Video object segmentation is an important component for object-based video coding schemes such as MPEG-4. A fast and robust video segmentation ...
FAST VIDEO OBJECT SEGMENTATION USING AFFINE MOTION AND GRADIENT-BASED COLOR CLUSTERING Ju Guo, Jongwon Kim and C.-C. Jay Kuo

Integrated Media Systems Center Department of Electrical Engineering-Systems University of Southern California, Los Angeles, CA 90089-2564 Email:fjuguo,jongwon,[email protected]

Abstract - Video object segmentation is an important component for object-based video coding schemes such as MPEG-4. A fast and robust video segmentation technique, which aims at ecient foreground and background separation via e ective combination of motion and color segmentation modules is proposed in this work. First, a non-parametric gradient-based iterative color clustering algorithm called the mean shift algorithm is employed to provide a robust initial dominant color regions according to color similarity. With the dominant color information obtained from previous frames as an initial seed for the next frame, we can reduce the amount of computational time by 50%. Next, moving regions are identi ed by a motion detection method based on the frame intensity di erence, which helps to circumvent the motion estimation complexity for the whole frame. Only moving regions are further merged or split according to the region-based ane motion model. Furthermore, sizes, colors, and motion information of homogeneous regions are tracked to increase temporal and spatial consistency of extracted objects. The proposed system is evaluated for several typical MPEG-4 test sequences, and it provides very consistent and accurate object boundaries throughout the entire test sequences.

INTRODUCTION Video segmentation, which aims at the exact separation of moving objects from the background, is the foundation of content-based video coding and indexing, among many other interesting applications. Even though image and video segmentation has been studied for more than 30 years, it is still considered one of the most challenging image processing problems, and demands creative solutions for major breakthrough. Three algorithms for automatic video segmentation were proposed in MPEG4 [1], which utilized temporal and spatial information in a certain way. Temporal segmentation can identify moving objects since most moving objects have distinct motion patterns from the background. Spatial segmentation

can identify object boundaries accurately if objects have a di erent visual appearance (such as the color or the gray-level intensity) from the background. Generally speaking, it is desirable to develop an automatic segmentation algorithm that requires fast implementation without user assistance. These requirements are particularly important for real time applications. In the framework of automatic video segmentation, the luminance-based morphological operation and the watershed algorithm can be used to segment objects within images in the space domain [1, 2]. Since the human visual system (HVS) is very sensitive to the edge and contour information, the exact extraction of object boundaries is crucial for visual quality of segmented results. More visual information should be used to make spatial segmentation robust and consistent. Among many visual cues, the color information has not yet been fully exploited in video segmentation, since it is often perceived that human eyes are not too sensitive to the chrominance components, e.g. the UV data in the YUV-format video, and consequently the contribution from the color information is treated as the second order e ect. Furthermore, additional computational complexities are required for color processing. We feel that the color information does play an important role in object identi cation and recognition in the human visual system, and it is worthwhile to include the information in the computation. Zhong and Chang [3] applied color segmentation to separate images into homogeneous regions, and tracked them along time for content-based video query. A simple uniform quantization in ? ? ? color space was used in [3]. In this work, we focus on the improvement of the automatic video segmentation result by using a fast yet robust color segmentation algorithm based on on the mean shift algorithm. The mean shift algorithm has been generalized by Cheng [4] for clustering data, and used by Comaniciu and Meer for color segmentation [5]. For the k-means clustering method, it is dicult to choose the initial number of classes. By using the mean shift algorithm, the number of dominant colors can be determined automatically. Here, we develop a non-parametric gradient-based algorithm that provides a simple iterative method to determine the local density maximum. The number of color classes in the current frame can be used as the initial number of color classes of the next frame. This helps in reducing the computational complexity of color segmentation. After separating an image frame into homogeneous regions, we determine whether each region belongs to the background or the foreground by motion detection based on higher order statistics. Only moving regions are further merged or split using the region-based ane motion model. The six parameters of the ane motion model are estimated for each region. Regions with similar motion parameters are merged together. Regions that do not t to the ane motion model well are split. The size, color, and motion information of each region is tracked to increase the consistency of extracted objects. The system is applied to segment several MPEG-4 test videos. We have observed accurate object boundaries and the temporal and spatial consistency from L u v

experimental results. The paper is organized as follows. A general description for the overall system is introduced in Section 2. Video segmentation results for MPEG4 test videos are presented in Section 3. Concluding remarks are given in Section 4.

VIDEO SEGMENTATION SYSTEM System Overview

The proposed automatic video segmentation system is given in Fig. 1. At the rst stage, the global motion of image sequences is estimated using the six-parameter ane model. With this information, images can be aligned accordingly. The mean shift color segmentation algorithm is used to divide the image into homogeneous regions. For the initial segmentation, a statistical method for motion detection is used to identify whether each homogeneous region is moving or not. Only for moving regions, we apply the ane motion model for motion parameter estimation. At the last stage, the morphological open and close lters are used to smooth object boundaries and eliminate small regions.

Figure 1: Block diagram of the automatic video segmentation system.

Color Segmentation

The intensity distribution of each color component can be viewed as a probability density function. The mean shift vector is the di erence between the mean of the probability function on a local area and the center of this region. In terms of mathematics, the mean shift vector associated with a region ~x centered on can be written as: R ( )( ; ) ( ) = ~y2SR~x ~y2S~x ( ) where () is the probability density function. The mean shift algorithm says that the mean shift vector is proportional to the gradient of the probability density r ( ), and reciprocal to the probability density ( ), i.e. ( ) = r ( ( )) where is a constant. Since the mean shift vector is along the direction of the probability density maximum, we can exploit this property to nd the actual location of the density maximum. In implementing the mean shift algorithm, the size of the search window can be made adaptive to an image by setting the radius proportional to the trace of the global covariance matrix of the given image. By moving search windows in the color space using the mean shift vector iteratively, one dominant color can be located. After removing all colors inside the converged search window, one can repeat the mean shift algorithm again to locate the second dominant color. This process can be repeated several times to identify a few major dominant colors. The uniform color space ? ? ? was used by Comaniciu et al.[5] for color segmentation due to its perceptual homogeneity. To reduce the computational complexity, we use the YUV space for color segmentation since original video data are in the YUV format. The obtained results are comparable with those based on the ? ? ? space. The dominant colors of the current frame are used as the initial guess of dominant colors in the next frame. Due to the similarity of adjacent frames, the mean shift algorithm often converges in one or two iterations, thus reducing the computational time signi cantly. S

~ x

~ V

p ~ y

~ x

~ y

~ x d~ y

p ~ y d~ y

;

p

p ~ x

p ~ x

~ V

~ x

c

p ~ x

p ~ x

;

c

L u v

L u v

Motion Detection and Estimation

A robust motion detection method based on the frame di erence is used to identify whether homogeneous regions are moving or not. For each homogeneous region, if 85% pixels are identi ed as moving pixels, the region is identi ed as moving. Only for moving regions, motion vector eld is estimated by hierarchical block matching methods inside the regions. The obtained parameters are tested for the whole region. When the error is above a certain threshold, the region will be split according to the motion information. Regions with similar motion and color are merged together. Moving objects are projected to the next frame according to their ane motion models.

EXPERIMENTAL RESULT Two MPEG-4 QCIF sequences, i.e. \Akiyo" and \Mother and daughter", are used to test the proposed algorithm. For the \akiyo" sequence, there is only a small motion activity in the head and shoulder regions. The original 20th image frames are shown in Fig. 2(a). The result of color segmentation are given in Fig. 2(b). We can clearly see that each image is segmented into a few regions. For example, Akiyo is segmented into the hair region, the facial region, and the shoulder region. Each region has a well-aligned boundary corresponding to the real object. The motion detection algorithm identi es the moving region, which is given in Fig. 2(c). The boundary is not well detected as compared with real object boundary with only the motion information. By incorporating the spatial color segmentation results, the nal segmentation results are much improved as shown in Fig. 2(d). For the \Mother and daughter" sequence, there are more head and hand motion activities than \Akiyo". The results of color segmentation is shown in Fig. 3(b), for the 250th frame. Moving objects, identi ed by motion detection and de ned by color regions, were accurately segmented from background as given in Fig. 3(d). We can see from the above results that temporal segmentation can identify moving regions while spatial segmentation provides important information of object boundaries. Our system exploits spatial information of color similarity and obtain the accurate region boundary automatically. Since the human visual system is very sensitive to edge information, our segmentation results provide good visual quality.

CONCLUSION A new video segmentation scheme was proposed in this work. A nonparametric gradient-based color segmentation scheme was adopted to accurately locate region boundaries. Since the color segmentation algorithm uses an iterative approach to nd dominant colors, results of the current frame can be used in the next frame to speed the convergence process. This makes color segmentation a very fast for video segmentation. The color segmentation combined with region-based motion detection gives an ecient and accurate scheme for video segmentation. Good segmentation results were demonstrated using two MPEG-4 test sequences.

References [1] J. Ohm, Ed.,"Core Experiments on Multifunctional and Advanced Layered Coding Aspects of MPEG-4 Video", Doc. ISO/IE JTC1/SC29/WG11 N2176, May 1998.

[2] C. Gu and M.G. Lee, \Semantic Video Object Segmentation and Tracking Using Mathematical Morphology and Perspective Motion Model," in IEEE International Conference on Image Processing, Santa Barbara, CA, Oct. 1997. [3] D. Zhong and S.F. Chang, \Video Object Model and Segmentation for Content-Based video Indexing," in IEEE International Symposium on Circuits and Systems, Hong Kong, June 1997. [4] Y. Cheng, \Mean shift, mode seeking, and clustering," in IEEE Trans. Pattern Anal. Machine Intell., vol.17, pp.790-799, 1995. [5] D. Comaniciu and P. Meer, \Robust Analysis of Feature Space: Color Image Segmentation,\ in Computer Vision and Pattern Recognition, San Juan, Puerto Rico, June,1997.

(a)

(b)

(c)

(d)

Figure 2: The segmentation result of the \Akiyo" QCIF sequence with respect to the 20th frames: (a) the original images, (b) the color segmentation results, (c) the motion detection results and (d) the nal results.

(a)

(b)

(c)

(d)

Figure 3: The segmentation result of the \Mother and daughter" QCIF sequence with respect to the 250th frames: (a) the original images, (b) the color segmentation results, (c) the motion detection results and (d) the nal results.