Machine Vision and Applications manuscript No. (will be inserted by the editor)
Change Detection by Probabilistic Segmentation from Monocular View Francisco J. Hernandez–Lopez · Mariano Rivera
Received: date / Accepted: date
Abstract We present a method for foreground/backgroundImage processing methods have also motivated the devideo segmentation (change detection) in real-time that velopment of new applications; for example, video segcan be used in applications such as background submentation. This consists of partitioning video into spatraction or analysis of surveillance cameras. Our aptial, temporal or spatial–temporal homogeneous regions proach implements a probabilistic segmentation based according to a given feature [54]. Video segmentation on the Quadratic Markov Measure Field models. This requires an appropriate selection of characteristics (inframework regularizes the likelihood of each pixel betensity, color, texture or movement) and a distance mealonging to each one of the classes (background or foresure for comparing such characteristics. ground). We propose a new likelihood that takes into Our research is presented in the context of realaccount two cases: the first one is when the background time background subtraction in a video sequence, see is static and the foreground might be static or movFig. 1. We use an approach based on a binary segmening (Static Background Subtraction), the second one is tation implemented in the quadratic Markov measure when the background is unstable and the foreground field (QMMF) framework [41, 43]. We have chosen such is moving (Unstable Background Subtraction). Morea framework because of its flexibility to be adapted for over, our likelihood is robust to illumination changes, defining (and implementing) segmentation algorithms cast shadows and camouflage situations. We implement in a wide variety of tasks [43]. Moreover, the optimizaa parallel version of our algorithm in CUDA using an tion algorithms might be implemented in efficient parNVIDIA Graphics Processing Unit in order to fulfill allel architectures [41]. In the QMMF framework, a segreal-time execution requirements. mentation algorithm is defined as a quadratic programming (QP) problem whose data term depends on the Keywords Change detection · QMMF segmentation · specific task by means of a likelihood. The likelihood Graph Cut segmentation · Background likelihood · can be seen as the preference of the data for belonging Illumination changes · Shadow detection · Camouflage · to each one of the classes: background (BG) or foreBackground maintenance · GPU Programming · CUDA ground (FG) in our case. Our method requires a confident estimation of the BG model. We use only the first frame of the sequence, 1 Introduction assuming that the FG objects are not yet present (Fig. Images are one of the richest information sources for hu1a). Then, for the subsequent frames, we compute the mans. Therefore, image processing methods have been likelihood of each pixel belonging to the BG (Fig. 1c). extensively developed to improve the image quality and We note that the simple segmentation based on the to obtain more information from this perceptive source. comparison of the BG model with the current frame fails in general due to a list of factors: Centro de Investigacion en Matematicas A.C. Guanajuato Gto., 36240, Mexico Tel.: +52-473-7327155 Fax: +52-473-7325749 E-mail:
[email protected]
1. Unstable BG Motion caused by moving BG objects (such as trees or water waves) may cause the BG region to be misclassified as FG.
2
Francisco J. Hernandez–Lopez, Mariano Rivera
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 1 Background subtraction application. (a) BG model. (b) Current frame f . (c) Computed likelihood VM given the BG model. (d) Corrected likelihood VM robust to light instabilities, cast shadows and camouflage. (e) Probabilistic segmentation with the QMMF method. (f) Application illustration.
2. Illumination changes Lighting changes caused by unstable light sources or changes in external illumination conditions (such as clouds). 3. Object shadows FG objects can cast shadows in the BG regions that may be misclassified as FG. 4. Camouflage situation FG objects may have regions with similar colors as those occluded in the BG region. 5. Video artifacts and noise Video compression algorithms, such as MPEG, introduce artifacts in regions with high spatial frequency (e.g., at edges) that may result in false detected changes with respect to (w.r.t) the BG model; noise in the image has a similar effect. The main contribution of this work is the definition of a BG class likelihood, which is robust to the above listed factors (Fig. 1d). Our approach implements a binary video segmentation based on the QMMF model, to eliminate video artifacts and noise (Fig. 1e). Another contribution is the parallel implementation of the segmentation algorithm that is executed in real-time. A preliminary version of this work was presented in a conference paper [21]. Furthermore, an application of the proposed video segmentation method for video augmentation is reported in our work [22]. The rest of this paper is organized as follows. First, Sect. 2 presents an overview of existing approaches, which involve or deal with the background subtraction task. Next, Sect. 3 presents our method for FG/BG video segmentation. The presentation of our method is divided in two parts: Sect. 3.1 presents the QMMF framework and Sect. 3.2 presents our proposal for BG model likelihood that can deal with static or moving BG, illumination changes, cast shadows and camouflage
situations. Sect. 4 presents experiments that demonstrate our method’s performance and compare it with other State of the Art (SoA) methods. We also present, for comparison purposes, a variant of our method based on Graph Cut segmentation. Then, Sect. 5 presents implementation details and reports the processing time of our algorithm in CPU and GPU. Finally, our conclusions are given in Sect. 6.
2 Related work Since background subtraction is still a challenge in computer vision, several approaches have been published on the matter. One approach consists of modelling the BG and then estimating the membership of each pixel, in the current frame, w.r.t. the BG model. One of the most common BG models is the Gaussian Mixture Model (GMM or MOG) [49]. In GMM the density probability of each pixel value is modeled by a mixture of K Gaussians. A limitation is that, in general, a small K (3 - 5) is insufficient to accurately represent the BG. Then in [10] is proposed to use a Kernel Density Estimator (KDE) with K = 10 Gaussians and a sample with a size of 100 pixel values. In that work, the BG model combines two models: one that quickly adapts its parameters to the scene (short-term model) and another one that slowly adapts to changes (long-term model). Furthermore, their combined model can use color information for removing shadows. More recently, the method in [60] improves the results of [49] and [10] by constantly updating the number of Gaussians, K, for each pixel and by using a simple nonparametric adaptive density estimation classification. In [11, 12] is proposed to use two GMM BG models for learning at different rates and a finite-state machine for classifying the pixels. Different from GMM and KDE, our proposal uses as BG model K ≤ 5 prototypes per pixel. Then, the BG log–likelihood is computed using the nearest prototype and the update process is only applied to the winning prototype at each pixel. This strategy simplifies the calculations and allows us to implement real-time methods for illumination changes, cast shadows, camouflage situations and noise. On the other hand, Markov Random Field (MRF)based methods emerged for integrating spatial or spatial– temporal information to regularize the solution. In [56] an MRF is introduced based on spatial–temporal information at blob level. The minimization of the energy function is performed with an iterative deterministic scheme known as Highest Confidence First (HCF). Then, in [47] a competition scheme is proposed between
Change Detection by Probabilistic Segmentation from Monocular View
FG and BG models in a MAP-MRF decision framework. The energy function is solved by using the Ford– Fulkerson algorithm (Graph Cut). After, [40] presents a method based on Gaussian single model (GSM), MRF and Fisher linear discriminant analysis (FLDA). This method works on gray images without a shadow removal phase. More recently, in [3] an MRF approach is proposed for segmenting the video into three classes: FG, BG and Shadows. They use a microstructural feature at each pixel, which is one of four kernels (size of 3 × 3) applied to the pixel’s neighborhood. The label field is computed using the Simulated Annealing (SA) algorithm with the Metropolis criteria for accepting new states. Since the stochastic optimization using SA is very slow; they found that a deterministically modified Metropolis (MMD) relaxation algorithm has a similar efficiency but it is significantly faster, for processing images of 320 × 240 pixels runs at 1 frames per second (fps). When they used the ICM algorithm, the segmentation sped up to 3 fps, in exchange for some degradation in the segmentation result. The MRF-based methods mentioned above have a high computational cost because of their optimization technique. The use of Graph Cut-based minimization algorithms improve the computational performance significantly. Some works that use efficient implementations of Graph Cut are reported in [44, 52]. An efficient pixel-wise parallel implementation of Graph Cut is available in the NVIDIA NPP library [39]. More recently, in [46] is proposed a post-processing method based on Probabilistic Superpixel MRF (PS-MRF) to improve a previously computed segmentation. The energy function is minimized with Graph Cut based algorithms and their rate is 70 fps on images of 481 × 321 pixels using an NVIDIA GeForce GTX 560 Ti. Note that this processing rate is only for the post-processing phase. The performance is demonstrated on available results of algorithms reported in the Change Detection website [17]. Our proposal also uses an MRF approach for regularizing the segmentation. In particular, we found that the regularization based on the probabilistic segmentation (PS) method QMMF [41] (detailed in Sect. 3.1) has a better performance than the method based on Graph Cut segmentation [39]. According with our experiments QMMF-based regularization is faster and more accurate than Graph Cut. Now, we present a brief review of reported algorithms based on the initialization stage. SAmple CONsensus (SACON) [58] used a set of samples of background values at each pixel. Such a sample should be as representative as possible of the background values; so that the sample size is relevant: at least 20 values, but
3
good results are obtained with sample sizes between 100 and 200 values. In SACON, for classifying a new frame pixel, the number of background samples closer to the observed pixel value is counted. If the count is larger than a threshold, then the pixel is marked as BG. The proposal is interesting because of the use of a nonparametric approach for representing the background pixel value diversity. The SACON approach demonstrated more accurate segmentation than parametric models such as the GMM. In [5] is proposed a system for outdoor park surveillance named HECOL. In this system, the BG model is created in a period of initialization or bootstrapping. Such a BG model is updated using the temporal median of N = 9 gray level intensities. A neural network-based approach (SOBS) is reported in [34, 35], that needs a large N number of frames for training. In [13] is proposed a discriminative subspace learning approach that is robust to illumination changes via incremental maximum margin criterium (SL-IMMC). It requires N (between 30 and 100) training frames for the BG initialization phase. In [2] the ViBe method is proposed, this compares a set of N = 20 past values of a pixel with the current value, to determine whether that pixel belongs to the BG. In that work, the update process selects randomly a model from a pixel neighborhood. In [24] is proposed the pixel-based adaptive segmentator (PBAS), which used a history of N (N=35) values to construct the BG model (as SACON) and a random update rule (as ViBe). In that work, the idea is to extend the parameters to dynamic per-pixel state variables and introduce controllers for each of them. The method in [8] (ViBe+) proposed several modifications of the original ViBe algorithm and post-processing operations at blob level. A modification is the updating random factor, which accelerates the update of the BG samples. The updating factor is adjusted when there is jitter on the camera. The jitter detection is based on the Kanade-Lucas-Tomasi feature tracker available in [4]. The above described methods need, in general, a large number of samples for estimating the BG model. That requires a similar amount of memory to allocate such models (nonparametric approaches) or computational process for estimating parametric models. In contrast, we propose a non-parametric approach that uses a low number of samples. We only need the first frame for the Static Background Subtraction (SBS) case and the first five frames for the Unstable Background Subtraction (UBS) case. Another difference is that our method does not randomly update the BG models, instead, we calculate the minimum difference between the current value and the BG model values. A clear disadvantage
4
Francisco J. Hernandez–Lopez, Mariano Rivera
in our method is that we do not consider the shaking or jitter problem of the camera. On the other hand, there are methods that include depth information from a stereo camera and user interaction in scenes where the BG is almost static. Those methods are suitable for video conference environments. In [30] are fused color, contrast and stereo matching information to separate the FG and BG based on the Conditional Random Field (CRF). Next, in [59] is proposed to estimate the depth information using Treebased Classifiers (TC), the training set consists of depthbased label maps of the ground truth. Then, the motion, color, contrast and spatial priors are fused in a CRF model. In [53] are propagated opacity values from the former frame to the current frame, given the binary mask of the former frame. The opacity propagation (OP) is based on the local smoothness assumption of the FG and BG colors in a small spatial–temporal 3D window. This local smoothness assumption can be formulated as a quadratic function, which implies solving a large linear system of equations. The method reported in [32] propagates, as prior, the global shape of the FG (PGSF) to subsequent frames. As initialization, the FG is manually segmented in some selected keyframes. Our approach neither requires user interaction nor a sophisticated learning process; our models are the first frames. In order to achieve a real-time video segmentation, our approach is implemented in parallel hardware (GP-GPU). Seminal works using this kind of hardware are reported in [19, 14, 16, 6].
3 FG/BG video segmentation In the following subsection, we describe the binary segmentation method used in our proposal. Afterwards, we present our BG model and the procedure for computing the pixel likelihood. Such a likelihood is robust to the problems outlined in the introduction: unstable BG, illumination changes, cast shadows and camouflage situations.
pixel belongs to the FG class (VM likelihood is defined in Sect. 3.2). Although, we compute a likelihood VM robust to illumination changes, shadows and camouflage situations; such difficulties, combined with noise in the images, may produce noisy binary segmentations if a simple maximum likelihood approach is used. Therefore, we propose to estimate a regularized (smoothed) version of VM . An approach that has shown to be computationally efficient in, both, memory requirements and computational time is the QMMF framework [1, 21, 41–43, 57]. Such an approach is of the kind of PS methods: they do not compute a hard label map (as, for example, Graph Cut-based methods [29, 30, 51]), but a membership (or probability) of each pixel to the class set. The QMMF approach has shown to produce better quality results than hard labeling methods in the image segmentation task, see experiments in [41]. In addition, QMMF optimization algorithms allow one to use initial conditions and to compute partial solutions of good quality even if the optimization process is stopped before convergence: an important feature for multigrid implementations. We also found that QMMF optimization algorithms might be implemented in parallel. Let V = {VM , 1 − VM } be the likelihood measure field, then in the QMMF framework, such a likelihood measure field is regularized by solving the quadratic programming problem of the form (see [41, 43] for details): X X {Q(p; V, x)+µR1 (p; x)+λ min R2 (p; x, y)} (1) p
x∈Ω
y∈Nx
where Nx = {y : kx − yk2 = 1} denotes the set of first-neighbor pixels of x and k · k2 denotes the L2 – norm. The term Q attaches the regularized memberships p to the likelihood V . According to [41], choices for Q are norms but also information measures that penalize a deviation of p from V . In particular, QMMF uses a quadratic information measure because it leads to efficient optimization algorithms and produces competitive probabilistic segmentations [41]. The dissimilarity Q between the discrete distribution p and V is computed with
3.1 Binary probabilistic segmentation 3
In this work we denote by f : {Ω, N} → [0, 1] the RGB video sequence such that f (x, t) = [f1 (x, t), f2 (x, t), f3 (x, t)]T is the RGB-vector value of the pixel at position x = [x1 , x2 ]T ∈ Ω and t ∈ N indexes the frames (time). We denote by VM (x, t) ∈ (0, 1) the likelihood (model preferences) of the pixel x at frame t belonging to the BG class, then 1 − VM (x, t) is the likelihood that the
Q(p, V ) = −
K X
p2k log Vk
(2)
k=1
where K is the number of bins (classes in the segmentation context). The potential R1 , weighted by the parameter µ, promotes p to be as informative as possible (controls its entropy). Finally, the potential R2 , weighted by λ, controls the segmentation granularity. In our problem, we compute a likelihood with low entropy, then we can simplify the QMMF model and set µ = 0 with good
Change Detection by Probabilistic Segmentation from Monocular View
results. Our QMMF based probabilistic segmentation is given by 1X min {p2 (x, t)dB (x, t) + [1 − p(x, t)]2 dF (x, t) p 2 x∈Ω X [p(x, t) − p(y, t)]2 wγ (x, y)} +λ y∈Nx
s.t. p(x, t) ≥ 0,
(3)
where dB (x, t) = − log[VM (x, t)],
(4)
dF (x, t) = − log[1 − VM (x, t)] and γ ; wγ (x, y) = γ + kf (x, t) − f (y, t)k22
(5) (6)
where k·k22 denotes the square L2 –norm. Moreover, γ is a positive parameter that controls the edge sensitivity. The weights wγ , given by (6), promote the alignment of probability edges with gradient edges. According to [41] the solution to (3) can be computed by iterating the Gauss–Seidel scheme P dF (x, t) + λ y∈Nx p(y, t)wγ (x, y) P p(x, t) = . (7) dB (x, t) + dF (x, t) + λ y∈Nx wγ (x, y) Since VM (x, t) is in (0, 1), the resulting energy is convex and the iteration of (7) converges to the non-negative global minimum [41]. The Gauss–Seidel iteration produces an improved partial solution at each iteration and converges quickly if a “good” initial point is provided. We take advantage of the last properties of the segmentation algorithm as follows: 1. We focus on computing a good BG likelihood (VM ) that correctly classifies, as much as possible, the pixels in the two mentioned classes. Our initial guess is the likelihood itself: p0 (x, t) = VM (x, t). 2. We implement a multigrid algorithm with a fast convergence. The update formula (7) can be implemented in Graphics Processing Units (GPUs), our particular implementation is in CUDA NVIDIA [38]. 3. In the implementation of our real-time applications we compute a good partial solution by stopping the algorithm after few iterations; i.e., we achieve an approximated minimization.
5
of the SBS case is a videoconference data in which the BG needs to be detected in order to transmit, process, or compress the FG. An UBS application is surveillance camera video analysis for detecting objects that change, leave or appear in the scene; for example, cars or pedestrians in a car lot. An unstable background can be produced by rain, fountains, waves in lakes, waving trees, an unstable camera, etc. These are evidently more complex than the example given for SBS. The next subsections introduce the procedures for computing the BG model likelihood, VM , for the SBS and UBS cases, respectively. The VM calculation follows the stages: 1. Tonal Stabilization. First, we compute an initial VM robust to illumination changes by comparing the similarity between the tonal stabilized current frame and the BG model (or models). 2. Cast Shadows Detection. Next, for the pixels with VM belonging to FG that are cast shadows, we change their VM to prefer the BG class. 3. Camouflage Detection. Finally, for the updated VM pixels belonging to BG that are camouflaged with the background, we change their VM to prefer the FG class. 3.2.1 Likelihood for SBS In this case, we construct a BG–model, m, adaptive to gradual illumination changes. This BG model can be initialized with the first video frame: m(x) = f (x, t = 1). The adaptive mechanism will be discussed in Sect. 3.2.6. Meanwhile, we focus on the computation of pixel likelihood belonging to the BG. This likelihood is computed for successive frames by comparing their value with the model m. The BG likelihood is computed with the formula: 1 − 1 , kfe(x, t) − m(x)k2 ≤ θ1 VM (x, t) = (8) 1 , otherwise where the small constant 1 = 1 × 10−3 is used to avoid the undefinition of the log function in (4) and (5), fe is the frame stabilized by our tonal operator (see Sect. 3.2.3) and the threshold θ1 is a parameter that controls the sensibility; we use θ1 equal to 2% of the dynamic range of a color channel.
3.2 BG model likelihood, VM 3.2.2 Likelihood for UBS Here, we present a procedure for computing the BG likelihood for two kinds of applications. This first case is the BG subtraction in scenes where the BG is static and the FG might be static or moving, we call this case SBS. The second case corresponds to having an unstable BG and a moving FG, called here UBS. An example
The UBS case requires a set of BG prototype values. This is also useful for sudden illumination changes when these cannot be adjusted with our tonal transference operator: different models can be adapted to different illumination conditions.
6
Francisco J. Hernandez–Lopez, Mariano Rivera
Thus, let m(x) ≡ {mk (x)}k=1,...,N be the background model (prototypes) set for the pixel x updated at each iteration; typically N ∈ [1, 5]. Then, the likelihood for the observed pixel value f (x, t) is computed with 1 − 1 , kfe(x, t) − mk∗ (x) (x)k2 ≤ θ1 VM (x, t) = , (9) 1 , otherwise where k ∗ (x) = argmin kfe(x, t) − mk (x)k2 .
(10)
k
Unlike (8), this likelihood is evaluated w.r.t. the best model value. In our experiments, we initialize the models with the first N frames of the video stream: mk (x) = f (x, k) for k = 1, 2, . . . , N . Their update procedure is presented in Sect. 3.2.6. Following in this paper, m∗ (x) refers to the model used at each pixel: m∗ (x) ≡ mk∗ (x) (x). Note that even, for N = 1; as for the SBS case, m(x) can be denoted by m∗ (x). 3.2.3 Tonal stabilization As we have mentioned, there may be many factors that cause lighting changes in the scene: unstable light sources, weather-related changes in external illumination, or the camera’s automatic gain control; for listing some of them. We frequently observed that after a few seconds, lighting changes appear in the scene causing the BG region in new frames to be different from the BG model: producing wrong segmentations. For this reason, we include a lighting correction process. Our correction process is constructed based on the one proposed in [48]. In that paper, the authors implement a segmentation method for moving objects using a BG update process robust to illumination changes. Let m∗ (x) be the BG model and let 3
1X I[f (x, t)] = fi (x, t) 3 i=1 def
(11)
be the average operator over the RGB channels of each pixel x at the frame t. Moreover, let p(x, t − 1) be the probability in the previous frame of the same pixel x belonging to the BG class and by assuming that the previous frame at t − 1 was correctly segmented, then the procedure for controlling the illumination changes at the frame t is as follows:
1. We estimate the pixel-wise intensity ratio between the model m∗ and the current frame, to obtain the intensity variation in each pixel: I[m∗ (x)] + 2 D(x, t) = I[f (x, t)] + 2
(12) {x:p(x,t−1)≥1/2}
where 2 is a small constant that avoids a possible zero division, we fixed this value to 2 = 0.04% of the dynamic range of a color channel. 2. Then, we compute the photometric gain for each intensity value vj of the current frame. This gain is the average of the intensity ratios corresponding to pixels with the same vj : (P ¯ t [vj ] = D
ˆ
x∈Ω [δ(f (x,t)−vj )D(x,t)]
P
x∈Ω
δ(fˆ(x,t)−vj )
1,
, p(x, t − 1) ≥ 1/2 otherwise; (13)
where fˆ(x, t) = int(255 × I[f (x, t)]) and int(z) computes the closest integer to z. Then {vj }j=0...255 are the 256 possible intensity values that a pixel can take and δ is the Kronecker delta. Note that P ˆ x∈Ω δ(f (x, t) − vj ) can be 0 if there is not a pixel ¯ t [vj ] can take a with the vj value. In this case, D N AN value, anyway only the vj with at least a pixel value associated will be used in the tonal transference operator. 3. Now, given a frame f (x, t) and its intensity value fˆ(x, t), we apply the tonal transference operator as follows: fe(x, t) = [T [f1 (x, t)], T [f2 (x, t)], T [f3 (x, t)]]T , (14) where the operator T combines the actual RGBchannel value and its value corrected with the photometric gain: ¯ t [fˆ(x, t)](vj +2 )−2 ]+(1−β1 )vj , (15) T (vj ) = β1 [D def
with β1 a free parameter; in all our experiments we use β1 = 0.75 for the SBS and UBS cases. Note that, for β1 = 0 the tonal operator does not change the original frame, this will be used for processing the initial frame. The effect of the tonal transference operator T can be observed in the first row of Fig. 2, columns (b) and (c). Then the BG model likelihood is computed taking into account the tonal transference operator. This is summarized in Algorithm 1.
Change Detection by Probabilistic Segmentation from Monocular View
Algorithm 1 Tonal Stabilization.
– Likelihood given the chrominance differences:
Require: parameters N, 1 , 2 1: function Tonal(m, f, p, β1 , θ1 , k∗ , t) 2: if t = N + 1 then 3: Use fe(x, t) = f (x, t), for all x; 4: else 5: Compute D with (12), for all x; ¯ t with (13), for all x; 6: Compute D 7: Compute fe(x, t) with (14), for all x; 8: end if 9: if N = 1 then 10: Compute VM with (8), for all x; 11: else 12: Compute VM with (9), for all x; 13: end if 14: return VM ; 15: end function
VH (x, t) = exp(−ν2 kC[f (x, t)] − C[m∗ (x)]k1 ),
(18)
where k · k1 denotes the L1 –norm of a vector and C(z) is the operator for transforming the RGB z– value into the invariant color space c1c2c3 [45]. The underlying idea is that one expects slight changes in the chrominance of cast pixels. – Likelihood given by the gradient magnitude differences: VG (x, t) = exp(−ν3 |G[f (x, t)] − G[m∗ (x)]|),
(19)
where | · | denotes absolute value and the operator G computes a smoothed magnitude of the spatial gradient of f :
3.2.4 Cast shadows detection
G[f (x, t)] = H(x) ⊗ k∇x I[f (x, t)]k2 .
Another error source in the video segmentation process is cast shadows projected in the BG by FG objects that result in a BG appearance change and, as a consequence, a small likelihood of being BG. Our shadow detection process is only applied to pixels likely to be FG according to (8) or (9); i.e., pixels in {x : VM (x, t, ) < 1/2}. Recently reported articles have focused on solving the shadow detection problem [15, 27, 28, 33, 36, 45]. Those procedures are based on transforming the original image in the RGB color space into other color spaces that better separate the luminance and chrominance components. Thus, they implement a series of thresholds to classify the pixels as BG, object or shadow. We use a maximum likelihood approach for detecting cast shadows. Our shadow detection is based on the procedure described in [15] focused on detecting moving object shadows. Our procedure compares the BG model, m∗ (x), and the current frame, f (x, t), in three aspects. First, we compute the similarity (likelihood) VL in RGB color space. Second, as in [45], we compute a similar likelihood VH in an illuminance invariant color space. Third, we compute a likelihood VG comparing gradient magnitudes. Those three likelihood components are combined with VS (x, t) = VL (x, t) × VH (x, t) × VG (x, t).
7
(16)
Following, we discuss each component:
where the operator I computes the pixel intensity, see (11). Although there are many variants for implementing the spatial filter H, we chose a simple 3 × 3 Gaussian kernel filter because it can be efficiently computed in GPU. In this likelihood, the differences in the smoothed gradient magnitude of a cast shadow pixel and its corresponding model m∗ are relatively small, while the differences between the smoothed gradient magnitude of a FG pixel and its corresponding model m∗ are high, therefore, this likelihood avoids classifying the FG edges as cast shadows. Each one of the last three likelihoods contributes with certain probability to classify pixels as cast shadow. High values of VS (x, t) indicate that the pixel is a cast shadow. We remark that we use ν1 = ν2 = ν3 = 1 in our experiments, making the formulas, practically, parameter-less; unlike the approach in [15], that needs four parameters. The complete process for updating the BG likelihood is defined by Algorithm 2. Note that we update the BG likelihood of the pixels classified as cast shadows, changing its value to a high probability for belonging to BG. Fig. 2d shows the BG class likelihood, VM , after the cast shadow detection. In particular, the second row shows a noticeable improvement in the person’s arm region.
– Likelihood given the RGB color differences: 3.2.5 Camouflage detection VL (x, t) = exp(−ν1 kfe(x, t) − m∗ (x)k22 ).
(17)
Note that VL is a relaxed version of VM [see (8)] and it takes into account that cast shadows in the current frame introduce slight color changes w.r.t. the model m∗ .
A camouflage situation occurs when FG pixels have a similar color as the occluded region in the BG, such pixels are prone to be incorrectly classified because they cannot be detected with variants of model/frame differences, as those implemented by (8), (9), (17), (18)
8
Francisco J. Hernandez–Lopez, Mariano Rivera
Algorithm 2 Cast Shadow Detection.
Algorithm 3 Camouflage Detection.
Require: parameters N, ν1 , ν2 , ν3 , 1 1: function Shadow(VM , m, f, θ2 , k∗ , t) 2: if VM (x, t) < 1/2 then 3: Compute VL with (17), for all x; 4: Compute VH with (18), for all x; 5: Compute VG with (19), for all x; 6: Compute VS with (16), for all x; 7: Compute, for all x, 1 − 1 , VS (x, t) > θ2 VM (x, t) = ; 1 , otherwise
Require: parameter 1 1: function Camouflage(VM , f, p, θ3 , t) 2: Compute for all x: 1 − 1 , kfe(x, t) − fe(x, t − 1)k22 ≤ θ3 VT (x, t) = ; 1 , otherwise 3: (20)
8: end if 9: return VM ; 10: end function
and (19). Any visual system (biological or computational) has similar problems in these types of camouflage situations. Nevertheless, if the object is in motion, then the optical flow can provide important clues for detecting camouflaged objects. However, the cost of computing the optical flow limits its usability in realtime applications; e.g., in the applications presented in this research. Hence, we propose a simple camouflage detection procedure for solving this problem. Our prior is that the probability of each pixel does not undergo sudden changes from one frame to the next. We note that a pixel x passes to a camouflage situation when it fulfills the following conditions: 1. The absolute difference between the pixel x in the current tonal-stabilized frame, t, and the last one, t − 1, is small: kfe(x, t) − fe(x, t − 1)k22 ≈ 0. 2. The pixel x likely belongs to the BG class: VM (x, t) > 1/2. 3. The pixel x was classified as FG in the previous frame: p(x, t − 1) < 1/2. The last three conditions are implemented in Algorithm 3, see (22). Fig. 2, column (e), illustrates the behavior of the camouflage detection strategy. The camouflaged pixels are shown in the gray updated region with VM (x, t) = 1/2. 3.2.6 Background model maintenance Background model maintenance is an essential part in BG subtraction methods and, generally, is implemented depending on the application. Here, we present a proposal for the two investigated cases: SBS and UBS. In both cases, the BG models are updated after p is computed. That means the updated model at frame t is used for processing the frame t + 1.
4:
(21)
Compute for all x, VC (x, t) = VT (x, t) × VM (x, t) × [1 − p(x, t − 1)]; Compute for all x, 1 , VC (x, t) > 1/2 2 VM (x, t) = ; VM (x, t), otherwise
(22)
(23)
return VM ; 5: end function
The SBS case is relatively simple; the model is updated in the region classified as BG with m∗ (x) ← U(m∗ , f, x, t; βm )
(24)
where U is the model updating operator U(m∗ , f, x, t; βm ) = βm m∗ (x) + (1 − βm )f (x, t) def
(25)
and βm is a parameter, learning rate. For keeping intact the BG model in the FG region, we chose βm as follows: β1 , p(x, t) ≥ 1/2 βm = (26) 1, otherwise where β1 is the same parameter used for calculating the tonal correction, see (14). For the UBS case, we need to update the background model used at each pixel: k ∗ (x), see (10). First, we detect moving objects by computing the residuals between the current and the previous PS: R(x, t) = τ |p(x, t) − p(x, t − 1)| + (1 − τ )R(x, t − 1), (27) where, τ is our learned parameter that introduces a temporal filtering. Since the frames t = 1, 2 . . . , N are used as BG models, then the first computed residual corresponds to the frame t = N + 2 and it is initialized with R(x, N + 2) = |p(x, N + 2) − p(x, N + 1)|. Next, we calculate at most M axCC connected components in the binary image: 0, p(x, t) ≥ 1/2 b(x, t) = (28) 1, otherwise. The residual is accumulated for each connected component. The purpose is to accumulate the local residuals of regions with small residuals (as the mentioned example of a BG with waves in a lake): X ec = 1 R R(x, t), (29) |Rc | x∈Rc
Change Detection by Probabilistic Segmentation from Monocular View
where Rc is the set of pixels belonging to the c-th connected component. Following, static regions are assumed as BG. This implies that, as expected, a moving background object that becomes static for a while is considered as FG. Finally, we update the used model at each pixel with a learning rate depending on the pixel class: m∗ (x) ← U(m∗ , f, x, t; β ∗ ) where β1 , p(x, t) ≥ 1/2 ec ≤ ξ} β ∗ = β2 , p(x, t) < 1/2, {x ∈ Rc : R 1, otherwise,
9
can have BG model file stored for particular applications, e.g., parking lot surveillance. Next, the robust BG likelihood is computed and regularized with the QMMF procedure. Finally, the BG models are updated. Fig. 2 illustrates our method, step by step. The shown frames illustrate evident situations of lighting changes (first row), shadows (second row) and camouflage situations (third row).
(30) Algorithm 5 Change Detection by Probabilistic Segmentation (CDPS). (31)
β2 is a free parameter, which controls the speed of some static FG pixels to convert in BG, and the threshold ξ controls the decision that some FG pixels are considered as BG, it is fixed to 0.001 in our experiments. The second conditional deals with the case of possible large local residuals at the pixel level but small accumulated residuals at the region level. Algorithm 4 BG Maintenance. Require: parameters N, τ, ξ 1: function BGMaintenance(m, f, p, β1 , β2 , k∗ , θ2 , t) 2: if N > 1 and t − N ≥ 2 then 3: if t − N = 2 then 4: R(x, t) = |p(x, t) − p(x, t − 1)|, for all x; 5: else Compute R with (27), for all x; 6: end if 7: Adjust θ2 according with (32). ec with (29), for all x; 8: Compute R 9: Compute for all x: ec ≤ ξ} then 10: if p(x, t) < 1/2, {x ∈ Rc : R 11: m∗ (x) = U (m∗ , f, x, t; β2 ); 12: end if 13: end if 14: Compute for all x: 15: if p(x, t) ≥ 1/2 then 16: m∗ (x) = U (m∗ , f, x, t; β1 ); 17: end if 18: return m; 19: end function
3.3 Complete algorithm The complete procedure is summarized in Algorithm 5. This algorithm integrates the presented stages for being robust to lighting changes in the scene, cast shadows, camouflage situations and noise. Depending on the application, we use two kinds of background models: one for SBS and the another for UBS. The BG models are initialized with the first video frames. In practice one
Require: A video sequence f . The parameters: Binary Segmentation (λ, γ and M axIter), Robust BG model likelihood (β1 , β2 , θ1 , θ2 and θ3 ) and BG maintenance (N , τ , M axCC and ξ). 1: Initialize the BG-models for all the pixels x: m(x) = {f (x, t)}t=1,...,N ; 2: for t = N + 1, N + 2, . . . to the end of the video do 3: compute all x: k∗ (x) =
argmin
kfe(x, t) − mk (x)k2 ;
k∈{1,...,N }
4: VM = T ON AL(m, f, p, β1 , θ1 , k∗ , t); 5: VM = SHADOW (VM , m, f, θ2 , k∗ , t); 6: VM = CAM OU F LAGE(VM , f, p, θ3 , t); 7: Compute p by iterating (7); 8: m = BGM AIN T EN AN CE(m, f, p, β1 , β2 , k∗ , θ2 , t); 9: Output: p; 10: end for
4 Experiments and results In order to demonstrate our method performance, we conduct comparative experiments using the benchmark databases Microsoft, Wallflower, VSSN06 and Change Detection. These database represent a broad variety of FG/BG video segmentation conditions and tasks. We have integrated results of different algorithms from different papers to present this comparison. In addition, we also present a version of our proposal changing the QMMF binary segmentation procedure for a real-time implementation of Graph Cut, another binary segmentation method which is used in [44, 52] and it is available in the NVIDIA NPP library. Table 1 shows the parameters used in our experiments for the SBS and UBS modalities. Those parameters were adjusted by hand such that we obtained a small error. Segmented evaluation videos with our proposal are available in [20].
10
Francisco J. Hernandez–Lopez, Mariano Rivera
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 2 Demonstration of the proposed method in the task of background subtraction. (a) Acquired frame f . (b) BG likelihood VM . (c) Corrected BG likelihood VM with the T operator. (d) Corrected BG likelihood after the cast shadow detection process. (e) Complete BG likelihood VM after the correction of camouflage situations. (f) Final probabilistic segmentation using the QMMF method. (g) For illustration purposes: background subtraction in the original frame. Table 1 Set of parameters used in our experiments. Parameter λ γ M axIter 1 θ1 N 2 β1 ν{1,2,3} θ2 l1 u1 θ3 τ M axCC l2 u2 β2 ξ η
Values for Microsoft Database 75 0.005 25 1 × 10−3 3.98 × 10−4 1 0.1 0.75 1.0 0.94 – – 1 × 10−3 – – – – – – –
Values for Wallflower and VSSN06 Databases 100 0.01 25 1 × 10−3 6.31 × 10−5 5 0.1 0.75 1.0 0.89 0.88 0.91 1 × 10−3 0.1 100 2 6 0.75 0.001 3.92 × 10−4
Values for Change Detection Database 100 0.01 25 1 × 10−3 6.31 × 10−5 5 0.1 0.75 1.0 0.95 0.91 0.99 1 × 10−3 0.1 100 2 6 0.98 0.001 3.92 × 10−4
4.1 Evaluation of the SBS task: Microsoft database We first evaluated our algorithm with videos from the Microsoft database [7]. Although this database was created for evaluating stereo BG segmentations, it can be, and has been, used for monocular segmentation [32, 53, 59]. The used video and ground truth segmentations correspond to the ones for the left camera, in the stereo setup. The Microsoft database contains videos with static BG and static/moving FG; with exception of the video GTTS56 where there is a TV in the BG. We compare
Comment Regularization Edge sensitivity Maximum iterations Small constant Threshold (likelihood sensibility) Number of models Small constant Learning illumination changes Constants (shadows detection) Threshold (shadows detection) Lower threshold (shadows detection) Upper threshold (shadows detection) Threshold (camouflage detection) Learning residuals Maximum connected components Lower thresh (connected components) Upper thresh (connected components) Learning new BG Threshold (residuals) Small constant for adjusting θ2
our approach with SoA methods: TC [59], OP [53] and PGSF [32]. We can see a comparison of the segmentation median and mean errors in percentage in Table 2. Bold numbers represent the lowest errors in each row. Our proposal using QMMF improves the segmentation accuracy of those SoA methods, except for the GTTS43, GTTS56 and JM videos. The ground truth segmentation mask used for each video is indicated in the last column of Table 2. We report our results for four new videos in the Microsoft database (IUJW, AC, A vlad and A mart), those videos are not reported in [32, 53, 59].
Change Detection by Probabilistic Segmentation from Monocular View 3 Graph Cut QMMF
1 0.5
2 1.5 1 0.5
60
80
100
200
FRAME
1
200
11 10.5 10
100
200
1 0.5
200
2
1
50
100
150
200
2
1
0 0
50
100
150
1 50
100
150
200
FRAME
(m) A vlad
250
3 2
0 0
0 0
250
50
100
150
200
FRAME
(l) AC
3 Graph Cut QMMF
1
0 0
200
(k) IUJW Graph Cut QMMF
2.5
ERROR
2
1
FRAME
4
ERROR
3
2 1.5
0.5
5 Graph Cut QMMF
400
Graph Cut QMMF
2.5
1.5
250
300
3
(j) JM 5
200
(h) GTTS60
Graph Cut QMMF
FRAME
(i) IU
100
FRAME
0.5
0 0
4
0 0
300
3
FRAME
ERROR
100
2.5
1.5
150
1 0.5
(g) GTTS58
0.5 100
2 1.5
FRAME
ERROR
ERROR
2 1.5
Graph Cut QMMF
2.5
1
0 0
Graph Cut QMMF
2.5
300
(d) GTTS51
2
300
200
FRAME
1.5
3 Graph Cut QMMF
100
3
(f) GTTS56
3 2.5
0 0
250
Graph Cut QMMF
FRAME
(e) GTTS54
200
0.5
9 0
300
150
3
FRAME
50
100
2.5
9.5 100
50
(c) GTTS50
ERROR
2 1.5
1
FRAME
Graph Cut QMMF
11.5
ERROR
ERROR
0 0
500
12 Graph Cut QMMF
0.5
ERROR
400
(b) GTTS43
3 2.5
0 0
300
2 1.5
0.5
FRAME
(a) GTTS41
0 0
1 0.5
0 0
100
2 1.5
ERROR
40
Graph Cut QMMF
2.5
ERROR
20
3 Graph Cut QMMF
2.5
ERROR
ERROR
ERROR
2 1.5
0 0
3 Graph Cut QMMF
2.5
ERROR
3 2.5
11
2 1.5 1 0.5
50
100
150
200
FRAME
(n) A mart
250
0 0
100
200
300
FRAME
(o) GTTS56*
Fig. 3 Percentage of misclassified pixels (ERROR) on the Microsoft database, comparing Graph Cut method (dashed line) vs QMMF method (solid line) in our proposal. Table 2 Median and mean errors (percentage) of the segmentation using different methods on Microsoft database [7]. No.
Test Seq.
1 GTTS41 2 GTTS43 3 GTTS50 4 GTTS51 5 GTTS54 6 GTTS56 7 GTTS58 8 GTTS60 9 IU 10 JM Average (1-10) 11 IUJW 12 AC 13 A vlad 14 A mart Average (11-14)
[59]
[53]
0.80 0.02 1.31 1.06 0.33 0.93 0.79 6.33 2.56 0.27 1.44 – – – – –
0.65 0.52 0.77 0.73 0.35 0.40 0.68 0.92 3.85 0.33 0.92 – – – – –
Median Error (%) Proposal [32] Graph Cut QMMF 0.37 0.36 0.19 0.09 0.22 0.14 0.19 0.18 0.06 0.37 0.27 0.16 0.11 0.12 0.03 1.08 9.75 9.71 0.15 0.29 0.04 1.65 0.53 0.16 1.06 0.95 0.71 3.26 0.36 0.74 0.83 1.30 1.19 – 2.69 1.65 – 0.91 0.71 – 1.11 0.73 – 2.21 2.91 – 1.73 1.50
Mean Error (%) Proposal Graph Cut QMMF 0.36 0.18 0.91 0.30 0.24 0.08 1.05 0.19 0.13 0.03 9.78 9.73 1.32 0.14 3.81 2.61 1.96 0.71 0.41 0.70 2.00 1.47 4.37 2.37 0.99 0.82 1.11 0.77 2.33 2.93 2.20 1.72
GT depth depth depth depth depth depth depth depth motion motion motion depth depth depth
12
Francisco J. Hernandez–Lopez, Mariano Rivera
(a) GTTS43
(b) GTTS60
(c) IU
(d) IUJW
(e) A vlad
(f) A mart
1. The GTTS43 video has camera movement in the first 140 frames. 2. The GTTS51 and GTTS58 videos have illumination changes. 3. The GTTS56 video shows a switch on TV in the BG. 4. The GTTS60 video has camera movement in the last frames. 5. The JM video has illuminations changes when the person leans very close to the camera. 6. The AC, IU and IUJW videos present camouflage situations and illumination changes. 7. The A vlad and A mart videos have fast motion patterns of the FG and camouflage situations. With the aim of presenting results that could be compared with user interaction methods, we include in panel
3 Graph Cut QMMF
TIME (ms)
Fig. 3 plots the computed error for the available ground truth segmentations of each video, moreover, we can see a comparison of our proposal using Graph Cut and QMMF. Fig. 4 shows some appropriate and inaccurate results of our method from the low and peak values in the plots of Fig. 3. In spite of the good results in the comparative study, our illumination change control, cast shadow detection and camouflage detection have limitations. In particular, an analysis of the error peaks in the plots of Fig. 3 shows opportunity for improvement, we found the following reasons for the error peaks:
2.5 2 1.5 1 0
5
10
VIDEO
15
MEDIAN ERROR (%)
Fig. 4 Appropriate and inaccurate segmentations: Original frames (first and third rows), appropriate segmentations (second row), inaccurate segmentations (fourth row).
10 Graph Cut QMMF
8 6 4 2 0 0
5
10
15
VIDEO
Fig. 5 Evaluation of time and precision. Graph Cut method (dashed line) vs QMMF method (solid line).
(o) the segmentation error of video GTTS56 when the TV region is masked (GTTS56*). In this case, the corresponding errors in row 6 of Table 2 are 0.17, 0.13, 0.18 and 0.13, respectively for our proposal (Graph Cut and QMMF).
Fig. 5 shows a comparison of running time and precision between our video segmentation method based on QMMF and Graph Cut. Note that QMMF is faster and more accurate than our variant based on Graph Cut in both, median and mean percentage errors. The set of parameters used in all the SBS experiments are reported in Table 1.
Change Detection by Probabilistic Segmentation from Monocular View
4.2 Evaluation of the UBS task: Wallflower, VSSN06 and Change Detection databases Now, we present a performance evaluation of our proposal for the task of UBS. We use videos from the Wallflower [31], VSSN06 [25] and Change Detection [17] databases. The Wallflower database consists of seven videos: Moved Object (MO), Time Of Day (TOD), Light Switch (LS), Waving Trees (WT), Camouflage (C), Bootstrap (B) and Foreground Aperture (FA). For each video there is available a single-handmade segmented frame which is used as ground truth. The database VSSN06 is also composed by seven videos [from Video2 (V2) to Video8 (V8)] with ground truth segmentation for all sequences. In addition, the Change Detection database contains 90, 000 frames in 31 video sequences representing 6 categories in 2 modalities (color and thermal infrared). Table 3 shows the number of processed frames (#PF), the number of the single frame where the error is evaluated (#EF) and a brief description of each video in the Wallflower and VSSN06 databases. The categories and ground truth of Change Detection database are detailed in [18]. According with our experiments, the performance of our method for the UBS case depends sensitively on the shadow detection parameter θ2 . We found it hard to fix a single value of θ2 for all the videos in the three evaluated databases; due to the variety of challenges in the evaluated videos. We noted that if θ2 is too small, it is possible that some FG objects are classified as BG (in particular we observed that in the thermal or TOD videos). On the contrary, if θ2 is too large, it is possible that the method does not detect shadows (this happened in the videos of the shadows category). Based on the above observations, we proposed to automatically update the θ2 value in each frame depending on the number of detected connected components numCC [calculated in the binary image (28)] as follows: θ2 − η, numCC < l2 , θ2 > l1 θ2 ← θ2 + η, numCC > u2 , θ2 < u1 (32) θ2 , otherwise where η is a small constant used for increasing and decreasing the value of θ2 for the next frame; in our experiments we fixed η = 0.1/255; l1 < u1 are the bounds of θ2 , fixed by hand. Finally, in order to obtain a slight performance gain in the computed measures, we apply a 9×9 median filter to the binary segmentation b (28) to eliminate isolated pixels. For processing the databases, we fix our parameters as is indicated in Table 1. Selection of the number of models for UBS. Our procedure for UBS problems uses N models per pixel. For
13
choosing the adequate number of models, we compute the segmentation of the Change Detection database [17, 18] varying the number of models N = 1, 2, . . . , 10 and keeping fixed all the parameters. For computational efficiency and precision we pick five models (N = 5) per pixel. Comparison using the Wallflower database. Fig. 6 shows qualitative comparison of our proposal. Our proposal using QMMF has better results on LS, WT and C videos than our proposal using Graph Cut. Table 4 contrasts our method results with other SoA methods. We use four terms for the performance evaluation: false negatives (FN), false positives (FP), partial error (PE) and total error (TE); TE is the sum of PEs for all videos. All methods achieve a perfect segmentation for the MO video. SL-IMMC method has less PE on LS video, Wallflower method has less PE on FA video, SACON method has less PE on WT video, our proposal using Graph Cut has less PE on TOD and B videos and our proposal using QMMF has less PE on the C video. An analysis of the results suggests that our method achievement is the product of our Tonal Transference Operator and our BG models for the cases TOD and LS; our Cast Shadow Detection Likelihood for the video B; our Camouflage Situation Likelihood for the video C and our BG maintenance in the case of MO. Our proposal (with QMMF or Graph Cut) has the best result of Total Error among the other six methods. In addition, Tables 5 and 6 show the result of our proposal evaluated on the Wallflower database for the seven measures reported in [17]: recall (Re), specificity (Sp), false positive rate (FPR), false negative rate (FNR), percentage of wrong classification (PWC), F1 and precision (Pr). Bold numbers represent the best averages of each measure between our proposal using Graph Cut and our proposal using QMMF. Note that our proposal using QMMF has better results in the Sp, FPR, PWC, F1 and Pr measures than our proposal using Graph Cut. Comparison using the VSSN06 database. Fig. 7 shows a qualitative comparison of our proposal. Note that our proposal using QMMF has better results on videos with vacillating BG and when the absent foreground is not available, however, note that the results of our proposal using Graph Cut are more accurate in the object segmentation on videos V2 and V7. Table 7 shows the segmentation results on the VSSN06 database. This database was used for a FG/BG video segmentation algorithm competition [25]. In the competition, the algorithms had a startup period for each video in which they could learn the BG. During this period the performance was not evaluated. In our experiment, we only use the
14
Francisco J. Hernandez–Lopez, Mariano Rivera
Table 3 Description of videos in Wallflower and VSSN06 databases. #PF indicate the number of processed frames (starting in the first one) and #EF the evaluated frame (Wallflower Db) or the range of evaluated frames (VSSN06 Db). Database
Wallflower 160 × 120
VSSN06 384 × 240
Video MO TOD LS WT C B FA V2 V3 V4 V5 V6 V7 V8
# PF 1745 5890 2715 287 353 3055 2113 747 897 826 746 746 747 1188
# EF 985 1850 1865 247 251 299 489 6 to 747 6 to 897 6 to 826 6 to 746 6 to 746 6 to 747 6 to 1188
Brief Description Indoor, moved BG object in conference room Indoor, gradual light change in EasyLiving lab Indoor, sudden light change in Vision Lab Machine Room Outdoor, moving tree in background Indoor, person walks in front of computer monitor Outdoor, absent foreground is not available Indoor, a person sleeping and waking at desk Indoor, a person moving Outdoor, vacillating BG with waving flowers Outdoor, vacillating BG with waving flowers Indoor, a person and a cat moving Indoor, absent foreground is not available Outdoor, vacillating BG with waving trees Outdoor, local illumination Changes
Table 4 Performance of algorithms on Wallflower database. Algorithm MOG [49] KDE [10] SL-IMMC [13] Wallflower [55] SACON [58] Proposal Graph Cut Proposal QMMF
Error Type FN FP PE FN FP PE FN FP PE FN FP PE FN FP PE FN FP PE FN FP PE
Moved Object 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Time Of Day 1008 20 1028 1298 125 1423 626 10 636 961 25 986 236 147 383 132 110 242 239 49 288
Light Switch 1633 14169 15802 760 14153 14913 711 15 726 947 375 1322 589 1031 1620 99 943 1042 80 873 953
Waving Trees 1323 341 1664 170 589 759 4106 5 4111 877 1999 2876 41 230 271 3 560 563 10 297 307
Camouflage 398 3098 3496 238 3392 3630 1167 135 1302 229 2706 2935 47 462 509 565 322 887 32 379 411
Bootstrap 1874 217 2091 1755 933 2688 2175 503 2678 2025 365 2390 1150 125 1275 383 442 825 602 239 841
FG Apert. 2442 530 2972 2413 624 3037 2320 201 2521 320 649 969 1508 521 2029 1092 683 1775 1002 508 1510
TE
27053
26450
11974
11478
6087
5334
4310
Table 5 Performance of our proposal using Graph Cut on all seven categories of Wallflower database for all seven measures. Category MO TOD LS WT C B FA Average
Re 1.0000 0.9041 0.9687 0.9995 0.9453 0.8576 0.7790 0.9220
Sp 1.0000 0.9938 0.9411 0.9580 0.9631 0.9729 0.9519 0.9687
FPR 0.0000 0.0062 0.0589 0.0420 0.0369 0.0271 0.0481 0.0313
FNR 0.0000 0.0959 0.0313 0.0005 0.0547 0.1424 0.2210 0.0780
PWC 0.0000 1.2697 5.4327 2.9329 4.6530 4.3410 9.2684 3.9854
F1 1.0000 0.9114 0.8547 0.9543 0.9566 0.8483 0.8126 0.9054
Pr 1.0000 0.9188 0.7647 0.9129 0.9681 0.8392 0.8493 0.8933
Change Detection by Probabilistic Segmentation from Monocular View
15
Table 6 Performance of our proposal using QMMF on all seven categories of Wallflower database for all seven measures. Category MO TOD LS WT C B FA Average
(a)
Re 1.0000 0.8264 0.9747 0.9983 0.9969 0.7762 0.7972 0.9100
Sp 1.0000 0.9972 0.9455 0.9777 0.9566 0.9854 0.9643 0.9752
(b)
FPR 0.0000 0.0028 0.0545 0.0223 0.0434 0.0146 0.0357 0.0248
FNR 0.0000 0.1736 0.0253 0.0017 0.0031 0.2238 0.2028 0.0900
(c)
PWC 0.0000 1.5111 4.9687 1.5993 2.1560 4.4252 7.8847 3.2207
(d)
F1 1.0000 0.8877 0.8662 0.9745 0.9804 0.8324 0.8392 0.9115
Pr 1.0000 0.9587 0.7794 0.9518 0.9645 0.8973 0.8858 0.9196
(e)
(f)
(g)
Fig. 6 Qualitative comparison of our proposal in the Wallflower database. From left to right: (a) MO (Frame 985), (b) TOD (Frame 1850), (c) LS (Frame 1865), (d) WT (Frame 247), (e) C (Frame 251), (f) B (Frame 299) and (g) FA (Frame 489). From top to bottom: original frame, ground truth, result of our proposal using Graph Cut and result of our proposal using QMMF.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Fig. 7 Qualitative comparison of our proposal in the VSSN06 database. From left to right: (a) V2 (Frame 453), (b) V3 (Frame 661), (c) V4 (Frame 490), (d) V5 (Frame 247), (e) V6 (Frame 311), (f) V7 (Frame 722) and (g) V8 (Frame 1109). From top to bottom: original frame, ground truth, result of our proposal using Graph Cut and result of our proposal using QMMF.
16
Francisco J. Hernandez–Lopez, Mariano Rivera
Table 7 Performance of algorithms on VSSN06 database. Algorithm MOG [5] HECOL [5] Proposal Graph Cut Proposal QMMF
Error Type AFN AFP APE AFN AFP APE AFN AFP APE AFN AFP APE
Video2 long 2450 170 2620 395 296 691 9 148 157 9 276 284
Video3 long 1755 595 2350 363 457 820 1399 156 1555 221 349 570
Video4 long 6688 116 6804 634 446 1080 1488 273 1761 249 427 676
Video5 long 3607 584 4191 1991 783 2774 763 873 1636 145 1347 1491
Video6 long 3819 973 4792 2525 816 3341 774 951 1725 143 1446 1590
Video7 long 2808 195 3003 1055 339 1394 214 185 399 161 266 427
Video8 long 3394 244 3638 2552 364 2916 77 238 315 9 319 328
ATE
27398
13016
7547
5367
Table 8 Performance of our proposal using Graph Cut on all seven categories of VSSN06 database for all seven measures. Category V2 V3 V4 V5 V6 V7 V8 Average
Re 0.9619 0.9300 0.9355 0.8578 0.8817 0.9282 0.9780 0.9247
Sp 0.9983 0.9695 0.9667 0.9849 0.9837 0.9952 0.9954 0.9848
FPR 0.0017 0.0305 0.0333 0.0151 0.0163 0.0048 0.0046 0.0152
FNR 0.0381 0.0700 0.0645 0.1422 0.1183 0.0718 0.0220 0.0753
PWC 0.2432 3.1167 3.4278 2.3422 2.4393 0.6370 0.5462 1.8218
F1 0.9430 0.5098 0.6227 0.8271 0.8508 0.8746 0.9488 0.7967
Pr 0.9248 0.3512 0.4666 0.7985 0.8220 0.8268 0.9213 0.7302
Table 9 Performance of our proposal using QMMF on all seven categories of VSSN06 database for all seven measures. Category V2 V3 V4 V5 V6 V7 V8 Average
Re 0.8660 0.7555 0.8507 0.7590 0.7946 0.8663 0.9615 0.8362
Sp 0.9996 0.9956 0.9949 0.9972 0.9969 0.9969 0.9993 0.9972
FPR 0.0004 0.0044 0.0051 0.0028 0.0031 0.0031 0.0007 0.0028
FNR 0.1340 0.2445 0.1493 0.2410 0.2054 0.1337 0.0385 0.1638
PWC 0.3198 0.8549 0.9483 1.8399 1.9028 0.6248 0.2645 0.9650
first five frames as startup period and the evaluation starts just at the sixth frame. We use four terms for the performance evaluation: average of false negatives (AFN), average of false positives (AFP), average partial error (APE equals AFN plus AFP) and average total error (ATE equals the sum of APE for all videos). The VSSN06 provides full segmentations as ground truth. We compare our results with two SoA methods. Our proposal using Graph Cut has less APE on videos V2, V7 and V8, and our proposal using QMMF has less APE on the rest of the videos. Note that our proposal has lower ATE than the SoA methods either using Graph Cut or QMMF. In addition, Tables 8 and 9 show the result of our proposal (Graph Cut and QMMF implementations, respectively) evaluated on the VSSN06 database for the seven measures reported in [17]. Bold numbers represent the best averages of each measure between our proposal using Graph Cut and our proposal using QMMF. We remark that the performance
F1 0.9188 0.7549 0.8443 0.8434 0.8682 0.8690 0.9741 0.8675
Pr 0.9786 0.7544 0.8381 0.9491 0.9568 0.8718 0.9870 0.9051
of our proposal is similar to the one obtained on the Wallflower database; also we note that QMMF improves the results in the Sp, FPR, PWC, F1 and Pr measures. Comparison using the Change Detection database. Fig. 8 shows a qualitative comparison of our proposal using Graph Cut and our proposal using QMMF. We show only one representative result of each category of Change Detection database, note that in panel (a) our proposal using Graph Cut has better results, in the panels (b) and (e) the difference is minimal, while in the panels (c), (d) and (f) our proposal using QMMF is better. Note that there are several errors in the panel (b) using our two approaches, this is caused by the shaking or jitter of the camera, and our proposal has not considered this problem yet. The detailed results of our proposal using Graph Cut and QMMF are shown in Tables 10 and 11, respectively. Bold numbers represent the best averages of each measure between our proposal
Change Detection by Probabilistic Segmentation from Monocular View
(a)
(b)
(c)
17
(d)
(e)
(f)
Fig. 8 Qualitative comparison of our proposal in the Change Detection database. From left to right: (a) baseline PETS2006 (Frame 754), (b) cameraJitter badminton (Frame 861), (c) dynamicBackground fall (Frame 2430), (d) intermittentObjectMotion sofa (Frame 1548), (e) shadow copyMachine (Frame 2782) and (f) thermal corridor (Frame 4965). From top to bottom: original frame, ground truth, result of our proposal using Graph Cut and result of our proposal using QMMF. Table 10 Performance of our proposal using Graph Cut on all six categories of Change Detection database for all seven measures. Category Baseline Camera Jitter Dynamic Background Intermittent Object Motion Shadow Thermal Average
Re 0.9656 0.7156 0.8140 0.7047 0.9206 0.2845 0.7342
Sp 0.9971 0.9224 0.9798 0.9356 0.9882 0.9878 0.9685
FPR 0.0029 0.0776 0.0202 0.0644 0.0118 0.0122 0.0315
FNR 0.0014 0.0129 0.0017 0.0183 0.0035 0.0581 0.0160
PWC 0.3981 8.6919 2.1534 7.2703 1.4568 6.0152 4.3309
F1 0.9533 0.4175 0.5784 0.5652 0.8388 0.3768 0.6217
Pr 0.9418 0.3110 0.5501 0.5373 0.7959 0.7714 0.6513
Table 11 Performance of our proposal using QMMF on all six categories of Change Detection database for all seven measures. Category Baseline Camera Jitter Dynamic Background Intermittent Object Motion Shadow Thermal Average
Re 0.9488 0.6025 0.7590 0.8084 0.9233 0.6195 0.7769
Sp 0.9964 0.9613 0.9947 0.9765 0.9846 0.9950 0.9848
FPR 0.0035 0.0387 0.0053 0.0235 0.0154 0.0050 0.0152
using Graph Cut and our proposal using QMMF. Our Graph Cut variant had a poor performance because it fails in the thermal category and, as a result, its overall measure is affected. Table 12 shows the average performance of our proposal and some of the SoA methods reported in the Change Detection rank [17] and [18]. Note that our proposal using QMMF has better results
FNR 0.0031 0.0171 0.0021 0.0148 0.0052 0.0117 0.0090
PWC 0.6238 5.3593 0.7281 3.4650 1.9516 1.5205 2.2747
F1 0.9208 0.4865 0.7495 0.7406 0.8092 0.6619 0.7281
Pr 0.8969 0.4397 0.8086 0.7624 0.7567 0.9014 0.7610
in at least four measures w.r.t. each one of the SoA methods here reported, except with the PBAS method [24].
18
Francisco J. Hernandez–Lopez, Mariano Rivera
Table 12 Comparative performance of the proposal and algorithms on Change Detection database. Algorithm PBAS [24] PSP-MRF [46] ViBe+ [8] SC-SOBS [35] KDE Nonaka et al. [37] KDE Elgammal et al. [9] GMM Stauffer-Grimson [50] Proposal Graph Cut Proposal QMMF
Re 0.78 0.80 0.69 0.80 0.65 0.74 0.71 0.73 0.78
Sp 0.990 0.983 0.993 0.983 0.993 0.976 0.986 0.968 0.985
FPR 0.010 0.017 0.007 0.017 0.007 0.024 0.014 0.032 0.015
FNR 0.216 0.196 0.309 0.198 0.349 0.256 0.289 0.266 0.223
5 Implementation and processing time Our method is implemented with CUDA and OpenCV libraries; the code is directly executable in Linux and it is compatible with any CUDA enabled video card. The experiments were executed on a PC-Intel Core i7 3.4 GHz with 8 GB RAM, Windows 7 (64-bits) using a single core and an NVIDIA video card (Geforce GTX 480). The rates are measured in fps and include the frame acquisition (copy memory from CPU to GPU), the frame processing (CPU and GPU) and result display controlled by CPU (copy memory from GPU to CPU). For a video with size of 640 × 480 pixels, our processing rate in the SBS modality is 62 fps and 30 fps for the UBS modality. Table 13 compares the processing time of our algorithm using only the CPU and using the combination CPU&GPU. Bold numbers in the column “Time (ms)” represent the lowest times in each row, and bold numbers in the column “Ratio” represent the three best ratios of our implementation. The part of the code that is executed in CPU is sequentially executed, i.e., runs in a single core. The initialization phase time includes the frame reading time and its loading into CPU memory or into GPU memory. Note that video reading from hard disk to CPU memory is faster than to GPU memory because the frame is automatically loaded into the array that will contain the information. Similarly, the result display in the CPU&GPU version requires transferring the frame from GPU memory to CPU memory. In the BG model likelihood phase, Eqs. (9) and (10) are highly parallelized: the same process is done at each pixel and the result of a pixel is independent of the rest. In the tonal stabilization phase, Eqs. (11), (12) and (14) are highly parallelized, although Eq. (13) is more difficult to parallelize. A particular strategy is to use the reduction primitive (see chapter three of [26]), however, the best implementation requires a GPU with capability >= 2.0 in order to use the FERMI architecture. We note that the computational efficiency gain is small using such a strategy, so that we opted to use simple CUDA capabilities. Also, the equations in the
PWC 1.77 2.39 2.18 2.41 2.89 3.46 3.10 4.33 2.27
F1 0.75 0.74 0.72 0.73 0.64 0.67 0.66 0.63 0.73
Pr 0.82 0.75 0.83 0.73 0.77 0.68 0.70 0.65 0.76
cast shadow and camouflage detection stage are easily parallelized. In the QMMF segmentation phase, Eq. (7) requires more attention, since this is the slowest part in the CPU version. In the Gauss–Seidel upgrade scheme, each pixel needs information from its neighbors. Then, to parallelize this part we use the Red and Black strategy (see the Solving Partial Differential Equations Lecture in [23]), where the idea is to process only the red pixels and next only the black pixels. Furthermore, we implement the Red and Black Gauss–Seidel method under a multigrid strategy to accelerate the convergence. Note that in this part our GPU implementation has one of our high ratios (CPU vs CPU&GPU). In the BG maintenance phase, Eqs. (26), (27), (28) and (30) were implemented in GPU, while Eq. (29) was implemented in CPU using the connected components ec is loaded algorithm of the OpenCV library. Here, R from the CPU memory to the GPU constant memory to improve the performance of Eq. (30). In summary, the complete process is sped up by about 6x using a GPU w.r.t. the stand alone CPU implementation.
6 Conclusions We propose an automatic and robust binary video segmentation method for real-time applications. Our approach implements a probabilistic segmentation based on the binary QMMF model. That framework regularizes the likelihood of each pixel belonging to each one of the models (BG or FG). We propose a BG class likelihood that takes into account two cases: the SBS and the UBS. The proposed likelihood is robust to illumination changes, cast shadows and camouflage situations. We conduct quantitative experiments that demonstrate our method performance in different scenarios (indoor and outdoor). Our implementation runs in real-time using an NVIDIA video card and CUDA.
Change Detection by Probabilistic Segmentation from Monocular View Table 13 Comparative performance of our method between the CPU vs the CPU with a GPU, for the UBS case with a 640 × 480 video as input. Phase
Time (ms) CPU CPU&GPU
Ratio
Initialization Load N BG models
8.68
8.24
1.05
Read Next Frame
1.16
1.99
0.58
13.14
0.36
36.5
11.2
10.22
1.09
42.99
1.03
41.73
5.98
0.99
6.04
108.65
4.58
23.72
26.52
5.01
5.29
8.14
8.87
0.91
4.59
30.26
BG model likelihood see eq. (9) and (10) Tonal stabilization see algorithm 1 Cast shadow detect. see algorithm 2 Camouflage detect. see algorithm 3 QMMF segmentation see eq. (7) BG maintenance see algorithm 4 Result display fps
Acknowledgements This work was supported in part by the CONACyT, Mexico [DSc. Scholarship to F.H. and research grant 131369 to M.R.].
References 1. Angulo, C., Marroquin, J.L., Rivera, M.: Bayesian segmentation of range images of polyhedral objects using entropy-controlled quadratic Markov measure field models. Appl. Opt. 47(22), 4106–4115 (2008) 2. Barnich, O., Van Droogenbroeck, M.: Vibe: a universal background subtraction algorithm for video sequences. IEEE Trans. Image Process. 20(6), 1709–1724 (2011) 3. Benedek, C., Sziranyi, T.: Bayesian foreground and shadow detection in uncertain frame rate surveillance videos. IEEE Trans. Image Process. 17(4), 608–621 (2008) 4. Birchfield, S.: Klt: an implementation of the KanadeLucas-Tomasi feature tracker. http://www.ces.clemson. edu/~stb/klt/. Accessed 12 July, 2013 5. Calderara, S., Prati, A., Cucchiara, R.: Hecol: Homography and epipolar-based consistent labeling for outdoor park surveillance. Comput. Vis. Image Underst. 111(1), 21 – 42 (2008) 6. Carr, P.: GPU accelerated multimodal background subtraction. In: Proceedings of DICTA, pp. 279–286. IEEE Computer Society, Silver Spring (2008) 7. Corporation, M.: Microsoft research. http://research. microsoft.com/vision/cambridge/i2i/DSWeb.htm. Accessed 21 January, 2010 8. Droogenbroeck, M.V., Paquot, O.: Background subtraction: experiments and improvements for ViBe. In: CVPR Workshops, pp. 32–37. IEEE, New York (2012) 9. Elgammal, A., Duraiswami, R., Harwood, D., Davis, L.: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proc. IEEE 90(7), 1151 – 1163 (2002)
19
10. Elgammal, A., Harwood, D., Davis, L.: Non-parametric model for background subtraction. In: Frame-Rate Workshop, pp. 751–767. IEEE, New York (2000) 11. Evangelio, R.H., Sikora, T.: Complementary background models for the detection of static and moving objects in crowded environments. In: AVSS, 2011 8th IEEE International Conference on, pp. 71–76. IEEE Computer Society, Silver Spring (2011) 12. Evangelio, R.H., Sikora, T.: Static object detection based on a dual background model and a finite-state machine. EURASIP J. Image Video Process. 2011 (2011). URL http://dblp.uni-trier.de/db/journals/ ejivp/ejivp2011.html#EvangelioS11 13. Farcas, D., Marghes, C., Bouwmans, T.: Background subtraction via incremental maximum margin criterion: a discriminative subspace approach. Mach. Vis. Appl. 23(6), 1083–1101 (2012) 14. Fukui, S., Iwahori, Y., Woodham, R.J.: GPU based extraction of moving objects without shadows under intensity changes. In: Proceedings of IEEE Congress on Evolutionary Computation, pp. 4165–4172 (2008) 15. Fung, G.F., Yung, N.H., Pang, G.K., Lai, A.H.: Effective moving cast shadow detection for monocular color traffic image sequences. Opt. Eng. 41(6), 1425–1440 (2002) 16. Gong, M., Cheng, L.: Real-time foreground segmentation on GPUs using local online learning and global graph cut optimization. In: Proceedings of ICPR’08, pp. 1–4 (2008) 17. Goyette, N., Jodoin, P., Porikli, F., Konrad, J., Ishwar, P.: 1st IEEE change detection workshop. http://www. changedetection.net. Accessed 10 December, 2012 18. Goyette, N., Jodoin, P., Porikli, F., Konrad, J., Ishwar, P.: Changedetection.net: a new change detection benchmark dataset. In: 2012 IEEE Computer Society Conference on CVPR Workshops, pp. 1 –8 (2012) 19. Griesser, A., Roeck, S.D., Neubeck, A., Gool, L.V.: Gpubased foreground-background segmentation using an extended colinearity criterion. In: G. G., H. J., N. H., S. M. (eds.) Proceedings of Vision, Modeling, and Visualization (VMV) 2005, pp. 319–326. IOS Press, New York (2005) 20. Hernandez-Lopez, F.J.: Change detection by probabilistic segmentation from monocular view. http://www. cimat.mx/~fcoj23/vidseg/exps.html. Accessed 12 July, 2013 21. Hernandez-Lopez, F.J., Rivera, M.: Binary segmentation of video sequences in real time. In: Ninth Mexican International Conference on Artificial Intelligence, p. 163168. IEEE (2010) 22. Hernandez-Lopez, F.J., Rivera, M.: AVScreen: a realtime video augmentation method. J. of Real-Time Image Process. (2013). DOI 10.1007/s11554-013-0375-9 23. Hoberock, J.: Programming Massively Parallel Processors with Cuda. http://code.google.com/p/ stanford-cs193g-sp2010/wiki/ClassSchedule (2010). Accessed 18 August, 2010 24. Hofmann, M., Tiefenbacher, P., Rigoll, G.: Background segmentation with feedback: the pixel-based adaptive segmenter. In: CVPR Workshops, pp. 38–43. IEEE, New York (2012) 25. H¨ orster, E., Lienhart, R.: Call for algorithm competition in foreground/background segmentation. http://mmc36. informatik.uni-augsburg.de/VSSN06_OSAC/. Accessed 12 December, 2011 26. Hwu, W.M.W.: GPU Computing Gems, Jade Edition. Morgan Kaufmann, Los Altos (2012) 27. Jianguang, L., Hao, Y., Weiming, H., Tieniu, T.: An illumination invariant change detection algorithm. In: Proceedings of ACCV, pp. 23–25 (2002)
20 28. Joshi, A.J., Atev, S., Masoud, O., Papanikolopoulos, N.: Moving shadow detection with low- and mid-level reasoning. In: Proceedings of ICRA’07, pp. 4827–4832 (2007) 29. Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., Rother, C.: Bilayer segmentation of binocular stereo video. In: Proceedings of CVPR, pp. 1186–1193 (2005) 30. Kolmogorov, V., Criminisi, A., Blake, A., Cross, G., Rother, C.: Probabilistic fusion of stereo with color and contrast for bi-layer segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 28(9), 1480–1492 (2006) 31. Krumm, J.: Test Images for Wallflower Paper. http://research.microsoft.com/en-us/um/people/ jckrumm/WallFlower/TestImages.htm. Accessed 23 November, 2011 32. Lee, S., Yun, I.D., Lee, S.U.: Robust bilayer video segmentation by adaptive propagation of global shape and local appearance. J. Vis. Commun. Image Represent. 21(7), 665 – 676 (2010) 33. Liu, H., Li, J., Liu, Q., Qian, Y.: Shadow elimination in traffic video segmentation. In: Proceedings of MVA, pp. 445–448 (2007) 34. Maddalena, L., Petrosino, A.: A self-organizing approach to background subtraction for visual surveillance applications. IEEE Trans. Image Process. 17(7), 1168–1177 (2008) 35. Maddalena, L., Petrosino, A.: The sobs algorithm: what are the limits? In: CVPR Workshops, pp. 21–26. IEEE, New York (2012) 36. Monteiro, G., Marcos, J., Ribeiro, M., Batista, J.: Roboust segmentation process to detect incidents on highways. ICIAR LNCS 5112, 110–121 (2008) 37. Nonaka, Y., Shimada, A., Nagahara, H., ichiro Taniguchi, R.: Evaluation report of integrated background modeling based on spatio-temporal features. In: CVPR Workshops, pp. 9–14. IEEE (2012) 38. NVIDIA: Cuda zone. http://www.nvidia.com/object/ cuda_get.html. Accessed 12 July, 2013 39. NVIDIA: Nvidia Performance Primitives. https:// developer.nvidia.com/npp. Accessed 12 July, 2013 40. Pan, X., Wu, Y.J.: GSM-MRF based classification approach for real-time moving object detection. J. of Zhejiang Univ. SCIENCE A 9(2), 250–255 (2008) 41. Rivera, M., Cede˜ no, O.D.: Variational viewpoint of the quadratic Markov measure field models: theory and algorithms. IEEE Trans. Image Process. 21(3), 1246–1257 (2012) 42. Rivera, M., Mayorga, P.P.: Quadratic Markovian probability fields for image binary segmentation. In: Proceedings of ICV (2007) 43. Rivera, M., Ocegeda, O., Marroquin, J.L.: Entropycontrolled quadratic Markov measure field models for efficient image segmentation. IEEE Trans. Image Process. 16(12), 3047–3057 (2007) 44. Rother, C., Kolmogorov, V., Blake, A.: “Grabcut”: interactive foreground extraction using iterated graph cuts. ACM Trans. Graph. 23(3), 309–314 (2004) 45. Salvador, E., Cavallaro, A., Ebrahimi, T.: Shadow identification and classification using invariant color models. In: Proceedings of ICASSP, pp. 1545–1548. IEEE Computer Society, Silver Spring (2001) 46. Schick, A., B¨ auml, M., Stiefelhagen, R.: Improving foreground segmentations with probabilistic superpixel Markov random fields. In: CVPR Workshops, pp. 27–31 (2012) 47. Sheikh, Y., Shah, M.: Bayesian modeling of dynamic scenes for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1778–1792 (2005)
Francisco J. Hernandez–Lopez, Mariano Rivera 48. Spagnolo, P., Orazio, T., Leo, M., Distante, A.: Moving object segmentation by background subtraction and temporal analysis. Image Vis. Comput. 24(5), 411–423 (2006) 49. Stauffer, C., Grimson, W.E.L.: Adaptive background mixture models for real-time tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 246–252 (1999) 50. Stauffer, C., Grimson, W.E.L.: Learning patterns of activity using real-time tracking. IEEE Trans. Pattern Anal. Mach. Intell. 22, 747–757 (2000) 51. Sun, J., Zhang, W., Tang, X., yeung Shum, H.: Background cut. In: Proceedings of ECCV, pp. 628–641 (2006) 52. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., Rother, C.: A comparative study of energy minimization methods for Markov random fields with smoothness-based priors. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 1068– 1080 (2008) 53. Tang, Z., Miao, Z., Wan, Y., Jesse, F.F.: Foreground prediction for bilayer segmentation of videos. Pattern Recognit. Lett. 32(14), 1720 – 1734 (2011) 54. Tekalp, A.M.: Video segmentation. In: B. A. (ed.) Handbook of Image and Video Processing, 2nd edn., chap. 4.10. Academic Press, London (2005) 55. Toyama, K., Krumm, J., Brumitt, B., Meyers, B.: Wallflower: principles and practice of background maintenance. In: Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 1, pp. 255– 261 (1999) 56. Tsaig, Y., Averbuch, A.: A region-based MRF model for unsupervised segmentation of moving objects in image sequences. In: CVPR, issue no. 1, pp. 889–896. IEEE Computer Society, Silver Spring (2001) 57. Vigueras, J.F., Rivera, M.: Registration and interactive planar segmentation for stereo images of polyhedral scenes. Pattern Recognit. 2(43), 494.505 (2010) 58. Wang, H., Suter, D.: A consensus-based method for tracking: modelling background scenario and foreground appearance. Pattern Recognit. 40(3), 1091 – 1105 (2007) 59. Yin, P., Criminisi, A., Winn, J., Essa, I.: Bilayer segmentation of webcam videos using tree-based classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 30–42 (2011) 60. Zivkovic, Z., van der Heijden, F.: Efficient adaptive density estimation per image pixel for the task of background subtraction. Patt. Recog. 27(7), 773–780 (2006)