SEGMENTATION AND APPEARANCE MODEL BUILDING ... - CiteSeerX

1 downloads 0 Views 350KB Size Report
The appearance model is encoded with a kernel-based PDF defined in a joint ..... [6] A. Jain, Fundamentals of Digital Image Processing,. Prentice-Hall, 1989.
SEGMENTATION AND APPEARANCE MODEL BUILDING FROM AN IMAGE SEQUENCE Liang Zhao and Larry S. Davis UMIACS University of Maryland College Park, MD 20742 ABSTRACT In this paper we explore the problem of accurately segmenting a person from a video given only approximate location of that person. Unlike previous work which assumes that the appearance model is known in advance, we developed an iterative expectation-sampling (ES) algorithm for solving segmentation and appearance modeling simultaneously. The appearance model is encoded with a kernel-based PDF defined in a joint color/path-length space. This appearance model remains unchanged during a short time period, although the object can articulate. Thus, we can perform the ES iteration not only for a single frame but also for an image sequence. The algorithm is iterative, but simple, efficient and gives visually good results. 1. INTRODUCTION In this paper we explore the problem of accurately segmenting a person from a video given only approximate location of that person (see Fig. 1 for an example). The input of our algorithm is the bounding boxes of the detected person (in our case obtained using our pedestrian detection algorithm and some simple tracking [4]) and the output is the segmentation of the person from the background at pixel level. We present a novel algorithm for solving the above segmentation problem. Unlike the traditional EM algorithm [3] which assumes mixture Gaussian models, we employ the non-parametric kernel-based PDF estimator [7]. A set of samples are generated from the segmented region for kernel-based PDF estimation. Thus the maximization step in the EM algorithm is replaced with the sampling step. This results in an iterative expectation-sampling (ES) algorithm. It is desirable to improve the segmentation for an image sequence if the object articulates. Previous work ([1], [2], [8]) has difficulty in extending the algorithms to handle an image sequence of articulated objects because the PDFs are defined in the feature spaces which are not invariant to articulated motion. To address the above problem, we built a constant appearance model of articulated object [10] so that both the model and segmentation can be improved over time.

Fig. 1. An example of figure-ground segmentation: the first row shows the bounding boxes of the detected person, the second row shows the results from our ES algorithm, and the third row shows the results from background subtraction. The remainder of the paper is organized as follows: in Section 2, we present the constant appearance model and the kernel-based PDF estimation for the model; then, in Section 3, we present the ES algorithm in detail. Section 4 presents the experimental results on real image sequences. Finally, in Section 5, we present our concluding remarks. 2. CONSTANT APPEARANCE MODEL For modeling the appearance of an articulated object, it is desirable that the intrinsic object geometry remains unchanged over short time periods. One such geometric feature is the path-length, which is defined as the length of the shortest path from a reference point to the pixel inside the object region. The reference point we choose is the top point of the head, because it is shared by both the foreground and background so that the path-length in both regions can be defined using the same reference point. Fig. 2 illustrates the idea of path-length which does not change with the motion of limbs. The path-length is calculated using distance

Fig. 2. Examples of Path-length defined as the length of the shortest path from the top of the head to the pixel inside the figure.

Fig. 3. Initial segmentation: (a) input image (b) edge image (c) distance transform (d) the projected model aligned with the edges

transform [6] within each segmented region. To make the path-length invariant to the size change of an object, we normalize it by dividing it by the maximum path-length within the object region. In addition to path-length, we represent the color of each  

  pixel as a three-dimensional vector where       

, are two chromaticity variables    !#"  and is a brightness variable. In summary, in a joint color/path-length $space, each pixel ! 

is encoded with a four-dimensional vector where $% '&)(+*-,.$/10+*-, 23(+4)5 (+*-,.$/10+*-, is the normalized pathlength. We approximate the PDF using the kernel density esti :9;4=< 687 mator as follows. Given a sample of pixels $>
pixel belonging to the foreground C is calculated using the following kernel density function:

following subsections address the following questions. First, how to start the algorithm automatically. Second, how to estimate the PDF of a region if the assignments of pixels to this region are continuous likelihood rather than binary valued. Third, how to adjust the assignments using natural color segmentation. Forth,how to determine if the iterative ES algorithm converges.

E 4GF 5 C

H JI K

M

L $VUW$ <

Suggest Documents