Modeling Complex Scenes for Accurate Moving Objects Segmentation

0 downloads 0 Views 2MB Size Report
Fig.1: Moving object segmentation in complex scenes using background subtraction. The first column ... [13] proposed an interactive image and video seg- ... But static background was .... quickly with high learning parameters until it is stable.
Modeling Complex Scenes for Accurate Moving Objects Segmentation Jianwei Ding, Min Li, Kaiqi Huang, and Tieniu Tan National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, P.R.China {jwding, mli, kqhuang, tnt}@nlpr.ia.ac.cn Abstract. In video surveillance, it is still a difficult task to segment moving object accurately in complex scenes, since most widely used algorithms are background subtraction. We propose an online and unsupervised technique to find optimal segmentation in a Markov Random Field (MRF) framework. To improve the accuracy, color, locality, temporal coherence and spatial consistency are fused together in the framework. The models of color, locality and temporal coherence are learned online from complex scenes. A novel mixture of nonparametric regional model and parametric pixel-wise model is proposed to approximate the background color distribution. The foreground color distribution for every pixel is learned from neighboring pixels of previous frame. The locality distributions of background and foreground are approximated with the nonparametric model. The temporal coherence is modeled with a Markov chain. Experiments on challenging videos demonstrate the effectiveness of our algorithm.

1

Introduction

Moving object segmentation is fundamental and important for intelligent video surveillance. The accuracy of segmentation is vital to object tracking, recognition, activity analysis, etc. Traditional video surveillance systems use stationary cameras to monitor scenes of interest. The usual approach for this kind of system is background subtraction. In many video surveillance applications, the background of scene is not stationary. It may contain waving trees, fountains, ocean ripples, illumination changes, etc. It is a difficult task to design a robust background subtraction algorithm that can extract moving object accurately from complex scenes. Many background subtraction methods have been proposed to address this problem. Stauffer and Grimson [1] used Gaussian mixture model (GMM) to account for the multimodality of color distribution for every pixel. Because of its high efficiency and good accuracy for most conditions, GMM has been used widely. And many researchers have proposed improvements and extensions of the basic GMM method, such as [2–4]. But an implicit assumption for GMMbased method is that the temporal variation of a pixel could be modeled by one or multiple Gaussian distribution. Elgammal et al. proposed Kernel density estimation (KDE) [5] to model color distribution, which does not have any

Modeling Complex Scenes for Accurate Moving Objects Segmentation

593

underlying distribution. But the bandwidth of kernel is difficult to choose and the computation cost is high. Sheikh and Shah [6] incorporated spatial cue with temporal information into a single model. They got accurate segmentation result in a Maximum a posteriori-Markov Random Field (MAP-MRF) decision framework. Some other sophisticated techniques can be seen in [7–11].

Fig. 1: Moving object segmentation in complex scenes using background subtraction. The first column shows the original images, the second column gives the corresponding segmentation using background subtraction.

However, there is a drawback of background subtraction. Thresholding is often used to segment foreground from background. Fig. 1 gives an example of moving objects segmentation in complex scenes using background subtraction. How to find an optimal segmentation is an energy minimization problem in MAP-MRF framework. Graph cuts [12] is one of the popular energy minimization algorithm. Boykov et al. [13] proposed an interactive image and video segmentation technique using graph cuts and got good results. This technique was then extended to solve other foreground/background segmentation problems. In [14], Layered Graph cuts (LGC) were proposed to extract foreground from background layers in stereo video sequences. In [15], Liu Feng et al. proposed an method which fused color, locality and motion information. Their method can segment moving objects with limited motion, but works off-line. Sun et al. [16] proposed background cut which combined background subtraction, color and contrast cues. Their method could work in real time. But static background was assumed in the initialization phase. In [17], multiple cues were probabilistically fused in bilayer segmentation of live video. In [18], the complex background was learned by multi-scale discriminative model. But training samples are needed in [17, 18]. We propose an online and unsupervised technique which learns complex scenes to find optimal foreground/background segmentation in a MRF framework. Our method absorbs the advantages of foreground/background modeling and Graph cuts algorithm. But our method works online and does not need prior information or training data. To improve the segmentation accuracy, color, locality, temporal coherence and spatial consistency are fused together in the framework. All these features are easy to get and calculate. We do not choose optical flow as [15, 17] because the reliable computation of optical flow is expensive and it suffers from aperture problem. The models of color, locality and temporal coherence are learned online from complex scenes. Color feature is the dominant one to discriminate foreground from background in our algorithm. To

594

Jianwei Ding, Min Li, Kaiqi Huang, and Tieniu Tan

model the color distribution of background image precisely and adaptively, a novel mixture of nonparametric regional model and parametric pixel-wise model is presented to improve the robustness of algorithm under different background change. The foreground color model for every pixel in current frame is learned from the neighboring foreground pixels in previous frame. The locality distributions of background and foreground are also considered to reduce error. The distribution is approximated using a nonparametric model. Besides, an additional spatial uniform distribution is also considered in the model. The label of each pixel at current moment is not independent of that in previous time. This temporal coherence of labels is modeled with a first order Markov chain to increase accuracy. A simple strategy to calculate the transition probabilities is also provided. The remainder of this paper is organized as follows. Section 2 describes the general model for moving object segmentation. Section 3 gives the modeling of complex scenes. Section 4 presents the experiments and results. Section 5 makes conclusions of our work.

2

General model for moving object segmentation

In this section, we will describe the general model for moving object segmentation in a MAP-MRF framework.

Fig. 2: A simple Flowchart of the proposed algorithm.

Moving object segmentation can be considered as a binary labeling problem. Let I be a new input image and P be the set of all pixels in I. The labeling problem is to assign a unique label lp ∈ {f oreground(= 1), background(= 0)} to each pixel p ∈ P. The maximum a posteriori estimation of L = {l1 , l2 · · · , l|P| } for I can be obtained by minimizing the Gibbs energy E(L) [13]: L∗ = arg min E(L)

(1)

The basic energy model E(L) used in [13] only considered color and spatial continuity cues, which are not sufficient [14]. The locality cue of a pixel is an

Modeling Complex Scenes for Accurate Moving Objects Segmentation

595

important spatial constraint that can help to judge the label value. Besides, the label of a pixel in current time is not independent of that in previous time. So the cue of temporal coherence is also needed to increase the accuracy for label decision. We extend the basic energy model and define E(L) as: X X X X X E(L) = λc Ec (lp ) + λs Es (lp ) + λt EN (lp , lq ) (2) Et (lp ) + p∈P

p∈P

p∈P

p∈P q∈Np

where λc , λs and λt are factors that measure the weights of color term, locality term and temporal coherence term respectively. Np is the 4-connection or 8connection neighborhood of pixel p. Ec (lp ), Es (lp ), Et (lp ) and EN (lp , lq ) are described below. Color term Ec (lp ) is the color term of pixel p, denoting the cost of color vector cp when the label is lp . Ec (lp ) is defined as the negative logarithm of color likelihood: Ec (lp ) = − log P r(cp |lp ) (3) Locality term Es (lp ) is the locality term of pixel p, denoting the cost of locality vector sp when the label is lp . Es (lp ) is defined as the negative logarithm of locality likelihood: Es (lp ) = − log P r(sp |lp ) (4) Temporal coherence term Et (lp ) is the temporal coherence term of pixel p, denoting the cost of label lp in current time T when the label was lpT −1 in previous time T − 1. We use a first-order Markov chain to model the temporal coherence and define Et (lp ) as: Et (lp ) = − log P r(lpT |lpT −1 )

(5)

Spatial consistence term There is spatial consistence for neighboring pixels. They are more likely to have similar colors and same labels. EN (lp , lq ) is used to convey the spatial consistence between neighboring pixels p and q when the labels are lp and lq respectively. Similar to [13, 17], EN (lp , lq ) is expressed as an Ising term and defined as: EN (lp , lq ) = δlp 6=lq

 + exp(−βd2pq ) 1+

(6)

where δlp 6=lq is a Kronecker delta function. The constant  is a positive parameter. β is also a parameter and set to β = (2hk cp − cq ki)−1 , where h·i is the expectation over color samples of neighboring pixels in an image. dpq denotes the color difference between pixel p and q. To reduce computation cost,  can be set to zero and β is taken as a constant, which is used in [13]. A simpler version of the Ising term can be seen in [6], which ignored the color difference. Among the four energy terms, only the spatial consistence term can be calculated directly from the new input image. But the color likelihood, the locality likelihood and the probability transition matrix should be known first to get the other three energy terms. They can be calculated by constructing color model, locality model and temporal coherence model. The three models are learned from the complex scenes and updated online.

596

Jianwei Ding, Min Li, Kaiqi Huang, and Tieniu Tan

At the beginning of the algorithm, an initial background model is constructed using the color information of input images. The background model is updated quickly with high learning parameters until it is stable. After initialization, the flowchart of the algorithm is shown in Fig. 2. At each time T, the first step is modeling the scene and calculate the energy E(L). The second step is to minimize the energy E(L) and output the segmentation. To minimize the energy E(L), we use the implementation of [19]. The third step is to update the scene model with slow learning parameters.

3

Modeling the complex scenes

The first and most important step is modeling the scenes. As it may contain dynamic texture or various illumination changes in complex scenes, color, locality, temporal coherence and spatial consistence are fused together to increase the accuracy of segmentation. The spatial consistence model defined by Eq. (6) can be calculated directly on the input image. We only have to construct the color model, the locality model and the temporal coherence model. They will be described below. For simplicity, the update of the scene model is also included in this section. 3.1

Color model

The color term is the dominant one among the four terms in Eq. (2). Different color features can be used in our algorithms, such as RGB, HSV, Lab. To discriminate foreground from background effectively, we choose RGB color features without color transformation. Let cp = (Rp , Gp , Bp )T be the color vector for each pixel p ∈ P. The modeling of color distribution includes three parts: modeling the background color distribution, modeling the foreground color distribution and updating the color model . Modeling the background color distribution. In [6], the background modeling methods are classified into two categories, which are methods that use pixel-wise model and methods that use regional model. The pixel-wise model is more precise than the regional model to measure the probability of pixel p belonging to background, but it is sensitive to spatial noise, illumination change and small movement of background. To model the color likelihood for each pixel p belonging to background, we define it as a mixture of regional model and pixel-wise model: P r(cp |lp = 0) = αc P rregion (cp |lp = 0) + (1 − αc )P rpixel (cp |lp = 0)

(7)

where αc ∈ [0, 1] is a mixing factor, and P rregion (cp |lp = 0) is the probability of observing cp at the neighbor region of pixel p, and P rpixel (cp |lp = 0) is the probability of observing cp at pixel p. The local spatial and temporal color information is used coherently in Eq. (7). Fig. 3 gives an comparatione of the three models. Our model can approximate the background color distribution precisely and is also robust to various spatial noise.

Modeling Complex Scenes for Accurate Moving Objects Segmentation

597

Fig. 3: Comparation of the three models. The three columns are color distributions of the pixel marked as red in the original image using the pixel-wise model, the regional model and the proposed mixture model, respectively.

The regional background color model can be learned from the local neighboring spatial-temporal volume Vp centered in pixel p with a nonparametric method: X 1 P rregion (cp |lp = 0) = KH (cp − cm ) (8) |Vp | cm ∈Vp

where KH (x) = |H|−1/2 K(H−1/2 x) and K(x) is a d-variate kernel function and H is the bandwidth matrix. We select the commonly used Gaussian kernel, so 1 KH (x) = |H|−1/2 (2π)−1/2 exp(− xT H−1 x) 2

(9)

To reduce computation cost, the bandwidth matrix H can be assumed to be the form of H = h2 I . The volume Vp is defined as Vp = {ct (u, v)||u − i| ≤ l, |v − j| ≤ l, T − Td < t < T }, where ct (u, v) is the color vector of pixel at the u-th row and v-th column of the image taken at time t, and Td is the period. Pixel p is at the location of i-th row and j-th column of the current input image at time T, and l is a constant value that measures the size of the local spatial region. The computation cost is very high to calculate Eq. (8) directly because of the large sample size. Besides, it is hard to choose proper kernel bandwidth H in high dimensional spaces even it has been simplified. We use the binned kernel density estimator to approximate Eq. (8). The accuracy and effectiveness

598

Jianwei Ding, Min Li, Kaiqi Huang, and Tieniu Tan

of binned kernel density estimator has been proved in [20]. Similar strategy has also been used in [6, 7]. We use a Gaussian mixture model as the pixel-wise model for every pixel p: K X P rpixel (cp |lp = 0) = ωk N (cp |µk , Σk ) (10) k=1

where ωk , µk and Σk are the weight, the mean color vector and the covariance matrix of the k-th component respectively, and K is the number of components. To simplify computation, the covariance matrix is assumed to be of the form: Σk = σk2 I. Modeling the foreground color distribution. The color likelihood P r(cp |lp = 1) for each pixel p belonging to foreground at time T can be learned adaptively based on data from the neighboring foreground pixels in the previous frame at time T-1. Here there is an implicit assumption that a foreground pixel will remain in the neighboring area. Besides, at any time the probability of observing a foreground pixel at pixel p of any color is uniform. Thus, the foreground color likelihood can be expressed as a mixture of a uniform function and the kernel density function: X 1 KH (cp − cm ) (11) P r(cp |lp = 1) = βc ηc + (1 − βc ) |Up | cm ∈Up

where βc ∈ [0, 1] is the mixing factor, and ηc is a random variable with uniform probability ηc (cp ) = 1/2563 . The set of neighboring foreground pixels in the previous frame is defined as Up = {ct (u, v)||u − i| ≤ w, |v − j| ≤ w, t = T − 1}, where w is a constant that measures the size of the neighboring region at pixel p, and pixel p is at the i-th row and j-th column of current image. The kernel density function is the same as Eq. (9). Updating the color model. The background color model is updated adaptively and online with the new input image after segmentation. For the nonparametric regional background color model, a sliding window of length lb is maintained. For the parametric pixel-wise model, similar maintenance strategy [1] is used here. If the new observation cp belongs to the k-th distribution, the parameters of the k-th distribution at time T is updated as follows: µk,T = (1 − ρ)µk,T −1 + ρcp

(12)

2 2 T σk,T = (1 − ρ)σk,T −1 + ρ(cp − µk,T ) (cp − µk,T )

(13)

ωk,T = (1 − ρ)ωk,T −1 + ρMk,T

(14)

where ρ is the learning rate. Mk,T is 1 when cp matches the k-th distribution and 0 otherwise. 3.2

Locality model

It is not sufficient to use color alone if the color of foreground is similar to that of background. It can reduce error to take account of the location of moving

Modeling Complex Scenes for Accurate Moving Objects Segmentation

599

objects. The spatial location of pixel p is denoted as sp = (xp , yp )T . As the shape of objects can be arbitrary, we use a nonparametric model similar to [15] to approximate the spatial distribution of background and foreground. The difference is that we take account of the spatial uniform distribution for any pixel at any location. The locality distribution of background and foreground are defined as: (sp − sm )T (sp − sm ) 1 max exp(− ) (15) P r(sp |lp = 0) = αs ηs + (1 − αs ) 2πσ 2 m∈Vs 2σ 2 1 (sp − sm )T (sp − sm ) P r(sp |lp = 1) = αs ηs + (1 − αs ) max exp(− ) (16) 2πσ 2 m∈Us 2σ 2 where αs ∈ [0, 1] is the mixing factor, and ηs is a random variable with uniform probability ηs (sp ) = 1/M × N . Here M × N means the image size. Vs and Us are the sets of all background pixels and foreground pixels in the previous frame respectively. The parameter σ is the bandwidth of kernel.

Fig. 4: Locality distributions of background and foreground.

An example of locality distributions of background and foreground is shown in Fig. 4. From this figure we can see that the pixels near moving objects have higher probability belonging to foreground, and the pixels far away from moving object have higher probability belonging to background.

Fig. 5: A simple example to show the temporal coherence of labels. The first image depicts the scene, the second image shows the foreground mask at time T-1, and the last image shows the foreground mask at time T. A small portion of pixels whose labels have changed are marked as red in the last image.

3.3

Temporal coherence model

For any pixel p ∈ P, if the label is foreground at time T-1, it is more likely for p to remain foreground than background at time T. So the label of pixel p at time T is not independent of labels at previous time. This is called the label temporal

600

Jianwei Ding, Min Li, Kaiqi Huang, and Tieniu Tan

coherency. Fig. 5 is an example to show the coherency. It can be modeled using a Markov chain. Here, a first order Markov chain is adopted to convey the nature of temporal coherence. There are four (22 ) possible transitions for the first order Markov chain model since there are only two candidate labels of pixel p. But there remain two degrees of freedom because there is relationship between P r(lpT = 1|lpT −1 ) and P r(lpT = 0|lpT −1 ), which is P r(lpT = 1|lpT −1 ) = 1 − P r(lpT = 0|lpT −1 ). We denote them as P rF F and P rBB , where P rF F denotes the probability of remaining foreground, and P rBB denotes the probability of remaining background. A simple strategy is described here to calculate P rF F and P rBB . They are both initialized to be 0.5. At time T-2, the number of foreground pixels and background pixels are denoted as nTF −2 and nTB−2 respectively. When there are detected moving objects, nTF −2 > 0 and nTB−2 > 0 . Then at time T-1 the number of pixels which remain foreground is nTF −1 and the number of pixels which remain background is nTB−1 . We can approximate P rF F and P rBB at time T by: P rFT F = T P rBB =

nT −1 1 αT + (1 − αT ) F 2 nTF −2 nT −1 1 βT + (1 − βT ) B 2 nTB−2

(17) (18)

where αT ∈ [0, 1] and βT ∈ [0, 1] are mixing factors. If nTF −2 = 0 or nTB−2 = 0, P rF F and P rBB remain the previous values.

Fig. 6: Segmentation results in the presence of fountain. The first column shows the original images, the second column gives the manual segmentation, the third column presents the result of our method, the last two columns show the results of GMM and KDE respectively.

4

Experiments and results

We tested our algorithm on a variety of videos, which include fountain, swaying trees, camera motion, etc. The algorithm was implemented using C++, on a

Modeling Complex Scenes for Accurate Moving Objects Segmentation

601

computer with 2.83 GHz Intel Core2 CPU and 2GB RAM, and achieved the processing speed of about 6 fps at the resolution of 240 × 360. Since GMM [1] and KDE [5] are two widely used background subtraction algorithms, we compare the performance of our method with them. To evaluate the effectiveness of the three algorithms, no morphological operations like erosion and dilation are used in the experiments. Besides, both qualitative analysis and quantitative analysis are used to demonstrate the effectiveness of our approach. Fig. 6 shows qualitative comparision result on outdoor scenes with fountain. The first column are frames 470, 580, 680 and 850 in the video. There are moving objects in the first, third and fourth images. The second column is the manual segmentation result. The third column shows the result of our method. The fourth and fifth columns are results of GMM and KDE respectively. The difficulty of moving objects extraction in this video is that the background changes continuously and violently. It is difficult to adapt the background model to the dynamic texture for the algorithms of GMM and KDE. But our method works well in the presence of fountain and gives accurate segmentation.

Fig. 7: Segmentation results in the presence of waving trees. The first column shows the original images, the second column gives the manual segmentation, the third column presents the result of our method, the last two columns show the results of GMM and KDE respectively.

Fig. 7 gives qualitative comparison result on outdoor scenes with waving trees. The video comes from [8]. Our method can detect most of the foreground pixels accurately and has little false detection. Fig. 8 gives segmentation on another two challenging videos by our method. The videos come from [10]. In the two original sequences, there are ocean ripples and waving trees respectively. In order to better demonstrate the performance of our algorithm, a quantitative analysis is provided here. The test video and corresponding ground-truth data come from [6]. This video contains average nominal motion of about 14 pixels. Fig. 9 shows a qualitative illustration of the results as compared to the

602

Jianwei Ding, Min Li, Kaiqi Huang, and Tieniu Tan

Fig. 8: Segmentation results in complex scenes. The first column shows the original images which contain ocean ripples, the second column presents the corresponding result of our method. The third column shows the original images which contain waving trees, the fourth column presents the corresponding result of our method.

Fig. 9: Segmentation results in the presence of camera motion. The first column shows the original images, the second column gives the manual segmentation, the third column presents the result of our method, the last two columns show the results of GMM and KDE respectively.

Modeling Complex Scenes for Accurate Moving Objects Segmentation

603

ground truth, GMM and KDE. From this figure, we can see that our method can segment moving objects accurately and has almost no false detections in the presence of camera motion. Fig. 10 shows the corresponding quantitative comparison in terms of precision and recall. The precision and recall are defined as: N umber of true positives detected T otal number of positives detected N umber of true positives detected Recall = T otal number of true positives

P recision =

1 0.9

0.8

0.8

0.7

0.7 0.6

0.6 0.5

Recall

Precision

1 0.9

Ours GMM with rate 0.005 GMM with rate 0.05 GMM with rate 0.1 KDE

0.4 0.3 0.2

0.4 0.3

Ours GMM with rate 0.005 GMM with rate 0.05 GMM with rate 0.1 KDE

0.2

0.1 0 250

0.5

0.1

300

350

400

frame number

450

500

0 250

300

350

400

450

500

frame number

Fig. 10: Precision and Recall comparisons of the three algorithms. The GMM is tested with different learning rate:0.005, 0.05, 0.1.

From Fig. 10 we can see that the detection accuracy both in terms of precision and recall is consistently higher than GMM and KDE. Several learning rates are selected to test for GMM. Though the precision and recall of our method are both very high, there are still some false positives and false negatives detected by our method. These pixels are mainly at the edge or contour of true objects. That is because spatial continuity information is used in the proposed method.

5

Conclusions

In this paper, we have proposed an online and unsupervised technique which learns multiple cues to find optimal foreground/background segmentation in a MRF framework. Our method adopts the advantages of foreground/background modeling and Graph cuts algorithm. To increase the accuracy of foreground segmentation, color, locality, temporal coherency and spatial continuity are fused together in the framework. The distributions of different features for background and foreground are modeled and learned separately. Our method can segment moving object accurately in complex scenes. That was proved by experiments on several challenging videos. Acknowledgement. This work is supported by National Natural Science Foundation of China (Grant No. 60736018, 60723005), National Hi-Tech Research and Development Program of China (2009AA01Z318), National Science Founding (60605014, 60875021).

604

Jianwei Ding, Min Li, Kaiqi Huang, and Tieniu Tan

References 1. Stauffer, C., Grimson, W.: Learning pattern of activity using real-time tracking. IEEE Trans. Pattern Analysis and Machine Intelligence 22 (2000) 747–757 2. KaewTraKulPong, P., Bowden, R.: An improved adaptive background mixture model for realtime tracking with shadow detection. Proc. European Workshop on Advanced Video Based Surveillance Systems (2001) 3. Zivkovic, Z.: Improved adaptive gaussian mixture model for background subtraction. Proc. Int. Conf. Pattern Recognition (2004) 28–31 4. Lee, D.S.: Effective gaussian mixture learning for video background subtraction. IEEE Trans. Pattern Analysis and Machine Intelligence 27 (2005) 827–832 5. Elgammal, A., Harwood, D., Davis, L.: Non-parametric model for background subtraction. Proc. ECCV 22 (2000) 751–767 6. Sheikh, Y., Shah, M.: Bayesian modeling of dynamic scenes for object detection. IEEE Trans. Pattern Analysis and Machine Intelligence 27 (2005) 1778–1792 7. Ko, T., Soatto, S., Estrin, D.: Background subtraction on distributions. Proc. ECCV 3 (2008) 276–289 8. Toyama, K., Krumm, J., Brumitt, B., Meyers, B.: Wallflower: Principles and practice of background maintenance. Proc. ICCV (1999) 255–261 9. Monnet, A., Mittal, A., Paragios, N., Ramesh, V.: Background modeling and subtraction of dynamic scenes. Proc. ICCV (2003) 1305–1312 10. Li, L., Huang, W., Gu, I.Y.H., Tian, Q.: statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing 13 (2004) 1459–1472 11. Pless, R., Larson, J., Siebers, S., Westover, B.: Evaluation of local models of dynamic backgrounds. Proc. CVPR (2003) 73–78 12. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Agarwala, A., Rother, C.: A comparative study of energy minimization methods for markov random fields. Proc. ECCV (2006) 16–29 13. Boykov, Y., Jolly, M.: Interactive graph cuts for optimal boundary region segmentation of objects in n-d images. Proc. ECCV (2006) 105–112 14. Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., Rother, C.: Bi-layer segmentation of binocular stereo video. Proc. CVPR (2005) 1186–1193 15. Liu, F., Gleicher, M.: Learning color and locality cues for moving object detection and segmentation. Proc. CVPR (2009) 320–327 16. Sun, J., Zhang, W., Tang, X., Shum, H.: Background cut. Proc. ECCV (2006) 628–641 17. Criminisi, A., G.Cross, Blake, A., Kolmogorov, V.: Bilayer segmentation of live video. Proc. CVPR 1 (2006) 53–60 18. Zha, Y., Bi, D., Yang, Y.: Learning complex background by multi-scale discriminative model. Pattern Recognition Letters 30 (2009) 1003–1014 19. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. Pattern Analysis and Machine Intelligence 26 (2004) 1124–1137 20. Hall, P., Wand, M.: On the accuracy of binned kernel estimators. J. Multivariate Analysis (1995)

Suggest Documents