Illumination invariant foreground detection using multi-subspace ...

4 downloads 1369 Views 2MB Size Report
Dec 31, 2009 - 1. DOI 10.3233/KES-2009-0188. IOS Press. Illumination invariant foreground detection using multi-subspace learning. Y. Dong, T.X. Han and ...
Galley Proof

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 1

International Journal of Knowledge-based and Intelligent Engineering Systems 13 (2009) 1–11 DOI 10.3233/KES-2009-0188 IOS Press

1

Illumination invariant foreground detection using multi-subspace learning Y. Dong, T.X. Han and G.N. DeSouza ∗ Electrical and Computer Engineering Department, University of Missouri, Columbia, MO, USA

Abstract. We propose a learning algorithm using multiple eigen subspaces to handle sudden illumination variations in background subtraction. The feature space is organized into clusters representing the different lighting conditions. A Local Principle Component Analysis (LPCA) transformation is used to learn a separate eigen subspace for each cluster. When a new image is presented, the system automatically selects a learned subspace that shares the closest lighting condition with the input image, which is then projected onto the subspace so the system can classify background and foreground pixels. The experimental results demonstrate that the proposed algorithm outperforms the original EigenBackground algorithm and Gaussian Mixture Model (GMM) especially under sudden illumination changes. Keywords: Multi feature subspace, LPCA, illumination invariance, eigenspace

1. Introduction Foreground object detection is an essential task in many image processing and image understanding algorithms, in particular for video surveillance. Background subtraction is a commonly used approach to segment out foreground objects from their background. In one way or another, background subtraction consists of modeling and storing away the background so it can be later compared to newly observed images. The difference image obtained by this comparison is then thresholded so the foreground objects can be segmented out. Despite the simplicity of the concept, in real world applications, the existence of shadows and gradual/sudden changes in illumination add another dimension to the difficulty in modeling any background, making it hard for most systems to achieve good performance under such conditions. Different approaches have been proposed in the past decades. In [1,2], for example, the system recursively updates the background model using adaptive filters. In both cases, the method can only accommodate gradual ∗ Corresponding author. Tel.: +1 573 882 5579; Fax: +1 573 882 0397; E-mail: [email protected].

illumination variations and it often fails in the presence of sudden illumination changes. A widely used technique, Gaussian Mixture Models (GMM), is employed by many approaches, such as in [3–7]. In principle, GMM is used to model the pixel intensity at each location with a mixture of Gaussians, by capturing the pixel statistics over time. Those multi-modal techniques extract foreground objects fairly well, even when gradual illumination variations occur. However, once again, GMM-based approaches do not perform well when the pixels change drastically. Besides, GMM alone cannot capture the spatial relations among pixels, which can be an important criterion for a coherent foreground segmentation. In that sense, a system known as Wallflower [8], was proposed with three major components: 1) the pixellevel component, which uses a Wiener filter to create a linear predictor of the pixel intensity values given the pixel history; 2) the region-level component, which groups homogeneous pixels based on their spatial relations; and finally 3) the frame-level component, which adapts to the gradual and sudden changes of the background. The combination of these three levels of processing gives good results, but the complexity added prevents its use with real-time performance.

ISSN 1327-2314/09/$17.00  2009 – IOS Press and the authors. All rights reserved

Galley Proof

2

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 2

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

Other GMM-based approaches were recently proposed to address, for example, illumination invariance and foreground fragmentation – e.g. [6]. However, typical problems of GMM, such as the use of global learning rates for the background update and slow convergence, were not addressed by this method. On the other hand, in [7], the authors proposed locally adaptive learning rates for each Gaussian distribution. The rates are based on the most likely Gaussians over several continuous frames. That approach improved convergence significantly, but the spatial relationships between pixels were still ignored and therefore foreground fragmentation still occurs. Finally, other statistical approaches to model dynamic scenes were also proposed. But they led to either computationally complex solutions [9] or to restrictions on the relative dynamic behavior of background and foreground [10]. In the celebrated EigenBackground [11], the system builds a PCA feature space to describe the appearances of the learned backgrounds. Newly observed images are then projected onto this eigenspace and the foreground objects are segmented out by thresholding the Euclidean distance between the reconstructed (projected) images and the original images. Due to the nature of PCA, this method extensively exploits the spatial relations between pixels and thus it often results in homogeneously coherent regions for each object in the foreground. Additionally, the off-line learning characteristic of the method guarantees a much faster performance than that of any of the approaches mentioned above. However, as our results demonstrate, EigenBackground can not accurately capture the appearance of the background in the presence of illumination changes – even when certain heuristics are applied, such as the ones proposed in [12], which discards several leading principal components of the subspace to alleviate variations in lighting conditions. Motivated by the above limitation in EigenBackground, we propose the use of local PCA transformations to build multiple subspaces: each subspace representing a different illumination condition. In other words, we create a set of feature subspaces by clustering the corresponding background images based on their similar appearances with respect to the lighting conditions. Once this set of subspaces is created, a newly observed image can be projected onto the most representative subspace, leading to a much more coherent and robust foreground segmentation. The rest of this paper is organized as follows: first, Multi-subspace learning using local PCA is described in Section 2. Next, in Section 3, we compare the pro-

posed method with the original EigenBackground and GMM-based approaches. Future work and final conclusion are given in Section 4.

2. Methodology 2.1. EigenBackground Revisited The EigenBackground method [11] consists of two steps: background learning and foreground classification. In the first step, a set of static images (the training set) is collected to form a high dimensional feature space. That is, each image in the set is regarded as a vector comprised of the pixels intensities in all three color channels in sequence. Then, the statistics of the set – mean and covariance – are used to determine the major axes of the sample distribution – that is, the principal eigenvectors or components of the distribution. This concept is depicted in Fig. 1 for a simple case of three dimensional vectors. In step two, background classification, the observed image (query) is projected onto the subspace defined by the above principal eigenvectors. Since the training images did not contain any foreground objects, it is expected that the projection of the query image will reconstruct only the static elements in the scene, i.e., the background. The actual foreground objects can be extracted by thresholding the Euclidean distance between the reconstructed and the observed images. The main reason for the success of this approach is that the learned eigenspace represents the probability distribution of the background, which models a range of possible appearances of that same background. This allows for the subtraction of the background to be insensitive to outliers caused by random noise, camera vibration, reflectance etc. Furthermore, regions of the foreground obtained by the above process are homogeneously coherent, since all pixels in the image are used in the classification, rather than individual pixels. That is, the eigenspace exploits the spatial relationship among neighboring pixels through their learned correlations. Unfortunately, this same property of EigenBackground causes drawbacks when in the presence of multiple lighting conditions. Since the principal components analysis (PCA) preserves the leading eigenvectors corresponding to the largest variances of the data set, a single eigen subspace, as illustrated in Fig. 2(a), is not capable of modeling minor nuances of the background accurately. In other words, the variances cap-

Galley Proof

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 9

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

3

Fig. 1. Conceptualization of an eigen subspace for a simple 3-dimensional case.

(a)

(b)

Fig. 2. 3D conceptualization of multi-subspace due to illumination conditions. (a) Feature space captured by original EigenBackground; (b) Same case represented by multi-subspaces.

Galley Proof

4

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 3

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

tured by the largest eigenvectors do not necessarily represent the probability distribution of the background appearances for any one lighting condition, but for all of the lighting conditions combined. Hence, this poor representation of the range of appearances of the background leads to a degraded performance for either minor or sudden changes in the data set, even when these changes occur in reasonably large numbers. 2.2. Proposed method In light of the above discussion, we propose the use of multiple eigen subspaces, instead of only one eigenspace. That is, we first partition the feature space into clusters using Vector Quantization. Next, we perform local PCA transformation [13] in each cluster to build individual subspaces. That means that subsets of training images under similar lighting conditions are automatically grouped to form their own eigen subspaces, as illustrated in Fig. 2(b). The introduction of multi subspaces guarantees efficient representation of the background appearance under different illuminations. Our method can be divided into two major phases, which we detail in the next two subsections. 2.2.1. Multi-subspace learning During the learning phase, the algorithm performs a set of steps, which can be summarized as follows. Step 1 Samples of k static images under different lighting conditions are collected and used as training images. Step 2 Training images use normalized chromaticity R G r = R+G+B , g = R+G+B and pixel intensity s = R + G + B to form a feature space. As it was shown in [14], normal chromaticity is less sensitive to small illumination variations, such as those caused by shadows, while pixel intensity compensates for the loss of the color information due to normalizing the chromaticities. Also, a training image I, with dimension d = image width × image height, is transformed into a feature vector x j by stacking chromaticities and pixel intensity together. That is, T  xj = rjT , gjT , sjT where each of the three column vectors has d dimensions and are given by: T T rj = (r1j , r2j , . . . , rdj ) , gj = (g1j , g2j , . . . , gdj ) , T and sj = (s1j , s2j , . . . , sdj ) . Finally, multiple images generate the training data-set A = {x j } for j = 1, . . . , k.

Step 3 Partitioning the training data set into clusters The clustering is achieved by Vector Quantization (VQ) with reconstruction error as the distortion measure. As shown in [13], taking reconstruction error as distortion measure has the advantage of keeping all the information along the leading direction during the local PCA projection while associating the projection of eigenvectors with the distortion measure of the Vector Quantization. Suppose that for each cluster C i we  i keep m leading eigenvectors e1i , · · · , eji , · · · , em for i = 1, . . . , q, then the local coordinate of the projection of feature vector x j onto C i is given by zji =

m  (xj − r i ) · eli

(1)

l=1

where r i is the reference vector or the center of C i . The reconstruction of the projection of the feature vector x j onto C i is m  zji eli (2) x ˆj = r i + l=1

The reconstruction error can thus be represented by ˆj 2 d(xj , r i ) = xj − x = xj − r i −

m 

zji eji 2

(3)

j=1

As an iterative partitioning method, we first initialize reference vectors (or cluster centers) r i (i = 1, . . . , q) to randomly chosen input vectors from the training data, and also we initialize the local covariance matrices i as the identity matrix. Usually, q ∈ [2,5], but this value can be adjusted according to specific application scenarios. After the initialization, the partitioning proceeds to separate the training data into the q clusters C i , where   (4) C i = x | d(x, r i )  d(x, r j ); ∀ i = j with d(x, r i ) defined by Eq. (3). Both the reference vector and the local covariance matrix in each iteration will be updated according to the following rules: 1  d(x, r) (5) r i = arg min i

r N i x∈C

i

=

1 Ni



(x − r i )(x − r i )T

(6)

x∈C i

where N i is the number of data points in C i . The leading eigenvectors of each local covariance matrix

Galley Proof

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 4

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

are also computed iteratively. The iteration terminates when the fractional change of the reference vector is below some predefined threshold. The reader should refer to [15] for details on an efficient algorithm to compute eigenvectors iteratively and to [16] for a specific application to incremental PCA learning. Step 4 Each cluster defines a specific eigen subspace that models the probability distribution of the background. Each subspace is spanned by the eigenvectors related to the direction of largest intra-class data variance. We use the following criterion to preserve the first m largest eigenvalues: m 

λj

j=1 T  λj

 threshold

(7)

j=1

where λj (j = 1, . . . , T ) are the eigenvalues in descending order and T is the total number of eigenvalues in one subspace. In our work, threshold was set to 90%. 2.2.2. Classification During the second phase, the algorithm performs the classification of the pixels into foreground and background using the following procedure. Step 1 A newly observed image is projected onto the closest subspace to reconstruct the appearance of the static elements in the scene (the image background). Here, by closest we mean that this query image and the images in the selected eigen subspace must have similar lighting conditions. In other words, the closest eigen subspace is the one that provides the least reconstruction error for the query image. So, let χ  denote the original feature vector of the new image, then we have: ρ = arg min d( χ, r i ) i=1,···,q

(8)

Step 2 Transform the Euclidean distance between the query image and the reconstructed image back into an image (in the original r, g, s feature space) and combine thresholding results of (r, g, s) to extract foreground.

5

Given the reconstruction of the projected feature vecm  z ρ elρ , the Euclidean distance can tor χ ˆ = r ρ + l=1

χ − χ ˆ 2 which will be be estimated by d( χ, r ρ ) =  re-arranged into three images (δ r , δg , δs ). The final foreground decision is made by logical AND operation of the thresholded images. That means an image pixel f will be considered as background pixel if and only if (δr (f )  thr ) ∨ (δg (f )  thg ) ∨ (δs (f )  ths ) (9) where (thr , thg , ths ) are predefined thresholds.

3. Experimental results We performed both qualitative and quantitative experiments using three challenging sequences. In the first sequence, a sudden change in illumination is caused by a lamp being turned on and off, repeatedly. The second sequence contains four distinct lighting conditions. The third one, benchmark sequence “Dance”, includes more than one thousand frames of synthetic images of indoor scene with two dancing characters under sudden illumination changes. For the qualitative experiment, we compared our algorithm against the original EigenBackground [11] and GMM [4] algorithms. The results of this comparison is detailed in Section 3.1. As for the second experiment, we chose the most challenging frames in all three sequences. A quantitative comparison between the proposed algorithm and the original EigenBackground is presented in Section 3.2 through the use of the recall and precision [17] measurements. 3.1. Qualitative results Our first test sequence contained 1100 image frames, of which we collect 50 static frames for training – including samples with both the lamp turned on and off. Figure 3(a) shows a test image (not in the training set) with the lamp turned off and Fig. 3(b) is another test image with the lamp turned on. By comparing the results from these two frames, it is clear from Fig. 3(d) that GMM does not adapt quickly enough to the sudden illumination change. Also, as we pointed out in Section 2.1, the original EigenBackground method builds a single subspace for both lighting conditions, and hence the representation of the data variance is not accurate. Figure 3(f) illustrates the shortcomings of the original EigenBackgrounds by the number of miss-detected pixels due to the change in illumination – including

Galley Proof

6

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 5

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 3. Sample images used for testing: (a) and (b) original images before and after turning on the lamp; (c) and (d) foreground detection using GMM; (e) and (f) foreground detection using original EigenBackground; (g) and (h) background detection using our method.

the bright picture frame on the wall (behind the man), which is mistakenly detected as foreground. On the other hand, our proposed algorithm automatically clusters these two training sets separately – each cluster corresponding to one of the quite different lighting conditions. Figures 3(g) and 3(h) show how well the algorithm performs despite the abrupt changes in illumination – due to the lamp being turned on and off.

Usually, a good representation of the subspace of the background appearance has well-conditioned eigenvalues, whereas an inaccurate one has ill-conditioned ones. The reason is the illumination variation is much bigger than the data variance of any probability distribution and therefore several leading eigenvalues dominate the

whole subspace. Mathematically, the ratio λλ1 tends to m be a big number, leading to a drastic change of magni-

Galley Proof

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 10

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

7

(a)

(b)

(c)

Fig. 4. (a) magnitude of the eigenvalues in the original EigenBackground (b) and (c) magnitude of the eigenvalues in the proposed approach.

(a)

(b)

(c)

(d)

Fig. 5. Four different lighting conditions used for training of sequence 2: (a) normal light. (b) brighter light. (c) half-lights off. (d) all lights off.

Galley Proof

8

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 6

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 6. Examples of the second test sequence showing the: (a) and (b) original images; (c) and (d) detection results from GMM method; (e) and (f) detection results from original EigenBackground technique; and (g) and (g) detection results from the proposed method.

tude of the eigenvalues. Figure 4(a) illustrates where eigenvalues drops abruptly from 5000 to a number near zero in the first 3 principal components, which makes

the ratio λλ1 approach infinity. On the other hand, the m eigenvalues in Figs 4(b) and 4(c) are well-conditioned, characterized by two slower decreasing curves. As we mentioned earlier, the performance of our algorithm was also evaluated using a second set of train-

ing sequences, this time containing four different illumination conditions (see Fig. 5). In that case, the test images also included two different lighting conditions similar to the ones in Figs 5(a) and (b) – that is, normal light and bright light. Figure 6 presents the results from the GMM algorithm, the original EigenBackground method, and our algorithm. As it can be observed, our method achieves better performance with fewer miss or false detections

Galley Proof

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 7

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

9

(a). original images

(b). results using GMM

(c). results using PCA

(d). results using LPCA Fig. 7. Examples of the benchmark sequence showing the: (a) original images; (b) detection results from GMM method; (c) detection results from original EigenBackground method; and (d) detection results from the proposed method.

than the other two methods. Besides the two sequences above, we also performed our experiment on one benchmark sequence “Dance”. Compared to the original EigenBackground and GMM, the results of our proposed method in Fig. 7(d) demonstrate a more robust detection of dancing characters even under drastic changes of indoor illuminations. 1 3.2. Quantitative results For a more quantitative analysis of our method, we manually segmented 50 representative frames of the first two sequences and 542 frames of the benchmark sequence “Dance”. The ground-truth frames were used as masks and were then applied to the output of either methods, i.e. original EigenBackground and proposed one, in order to calculate 1 http://imagelab.ing.unimore.it/vssn06/.

true positive the precision ( true positive +f alse positive ) and recall true positive ( true positive +f alse negative ). The results from Fig. 8 show that our proposed method has better performance under illumination variation than the original EigenBackground method. Moreover, the results comparison demonstrate the benefit of organizing multiple clustering to cope with sudden and drastic illumination variation. Especially, Fig. 8 (c), the precision using original EigenBackground drops drastically after frame 315 when the lighting conditions start switching. However, the proposed algorithm may suffer from off-line training step as LPCA relies on a ”K-Means like” clustering procedure. This implies that the result from clustering actually depends on the initialization of the reference vector for each cluster. That is, a bad initialization may lead to slow convergence or sub-optimal solution. Typically, we run the LPCA clustering many times and selected the outcome that generated the best validation results for the remaining steps of the algorithm.

Galley Proof

10

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 11

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

(a) sequence 1

(b) sequence 2

(c) "Dance"

Fig. 8. Precision and Recall for our sequences and the benchmark sequence.

Galley Proof

31/12/2009; 10:24

File: kes188.tex; BOKCTP/llx p. 8

Y. Dong et al. / Illumination invariant foreground detection using multi-subspace learning

4. Conclusion and future work We proposed a multi-subspace learning algorithm to robustly and effectively extract foreground objects under various illumination conditions. The feature space was first partitioned into clusters corresponding to different lighting conditions and then Local Principal Component Analysis was performed to build a set of multiple subspaces that better represent each background appearance. Foreground detection was shown to be more accurate and homogeneously coherent over traditional methods (GMM and EigenBackground), for all of our experiments, especially when the backgrounds presented abrupt differences in lighting conditions. One limitation of our algorithm is the random initialization of the multi-subspace clustering, which may affect the final detection of foreground. In the future, we plan to investigate reliable clustering algorithms. Also, in the classification step, the proposed algorithm uses fixed thresholds. In the future, more work will be conducted to make the threshold adaptive to different illumination conditions. Moreover, the current work is not adaptive to the change of the background – that is, to changes on-the-fly. A general framework which integrates an incremental learning procedure into the classification step will address the adaptation of our method. The Matlab code used in this work is available at http://vigir.missouri.edu/∼evan/BG.htm.

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

Acknowledgments [13]

The authors would like to thank Dr. James Keller for his contribution and help, as for example, by instigating us to approach this research problem as well as revising and suggesting improvements to this manuscript.

[14]

[15]

References [16] [1]

[2]

D. Koller, J. Weber, T. Huang, J. Malik, G. Ogasawara, B. Rao and S. Russell, Toward Robust Automatic Trac Scene Analysis in Real-Time, in Proceedings of the IEEE international conferences on Pattern Recognition, 1994. C.R. Wren, A. Azarbayejani, T. Darrell and A.P. Pentland,

[17]

11

Pnder: Real-time tracking of human body, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (July 1997), 780–785. W.E.L. Grimson, C. Stauffer and R. Romano, Using Adaptive Tracking to Classify and Monitor Activities in a Site, in Proceedings of the IEEE international conferences on Computer Vision and Pattern Recognition, 1998. C. Stauffer and W.E.L. Grimson, Adaptive Background Mixture Models for Real-Time Tracking, in Proceedings of the IEEE international conferences on Computer Vision and Pattern Recognition, 1998. M.A.T. Figueiredo and A.K. Jain, Unsupervised learning of nite mixture models, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (March 2002), 381–396. Y.-L. Tian, M. Lu and A. Hampapur, Robust and Effcient Foreground Analysis for Real-Time Video Surveillance, in Proceedings of IEEE conferences on Computer Vision and Pattern Recognition, 2005. D.-S. Lee, Effective gaussian mixture learning for video background subtraction, IEEE Transactions on Pattern Analysis and Machine Intelligence, May 2005. K. Toyama, J. Krumm, B. Brumitt and B. Meyers, Wallflower: Principles and Practice of Background Maintence, in Proceedings of the IEEE international conferences on Computer Vision, 1999, greece. L. Li, W. Huang, I.Y.-H. Gu and Q. Tian, Statistical modeling of complex backgrounds for foreground object detection, IEEE Transactions on Image Processing, November 2004. A. Monnet, A. Mittal, N. Paragios and V. Ramesh, Background Modeling and Subtraction of Dynamic Scenes, in Proceedings of IEEE international conferences on Computer Vision, 2003. N.M. Oliver, B. Rosario and A.P. Pentland, A bayesian computer vision system for modeling human interactions, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (August 2000), 831–843. P.N. Belhumeur, J.P. Hespanha and D.J. Kriegman, Eigenfaces vs. øsherfaces: Recognition using class speciøc linear projection, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (July 1997), 711–720. N. Kambhatla and T. Leen, Dimension reduction by local principal component analysis, Neural Computation 9 (1997), 1493–1516. A. Elgammal, D. Harwood and L.S. Davis, Non-parametric background model for background subtraction, in Proceedings of the 6th European conferences on Computer Vision, 2000. H. Murakami, V. Kumar and B.V.K., Efficient calculation of primary image from a set of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 4 (May 1982), 511– 515. D.A. Ross, J. Lim, R.-S. Lin and M.-H. Yang, Incremental Learning for Robust Visual Tracking, International Journal of Computer Vision, May 2008. K. Smith, D. Gatica-Perez, J. Odobez and S. Ba, Evaluating multi-object tracking, in Proceedings of IEEE conferences on Computer Vision and Pattern Recognition, 2005.