Real-Time Subspace-Based Background Subtraction Using Multi ...

4 downloads 15756 Views 4MB Size Report
Real-Time Subspace-Based Background. Modeling Using Multi-channel Data. Bohyung Han1 and Ramesh Jain1,2. 1 Calit2. 2 School of Information and ...
Real-Time Subspace-Based Background Modeling Using Multi-channel Data Bohyung Han1 and Ramesh Jain1,2 1

Calit2 School of Information and Computer Sciences University of California, Irvine, CA 92697, USA {bhhan,jain}@ics.uci.edu 2

Abstract. Background modeling and subtraction using subspaces is attractive in real-time computer vision applications due to its low computational cost. However, the application of this method is mostly limited to the gray-scale images since the integration of multi-channel data is not straightforward; it involves much higher dimensional space and causes additional difficulty to manage data in general. We propose an efficient background modeling and subtraction algorithm using 2-Dimensional Principal Component Analysis (2DPCA) [1], where multi-channel data are naturally integrated in eigenbackground framework [2] with no additional dimensionality. It is shown that the principal components in 2DPCA are computed efficiently by transformation to standard PCA. We also propose an incremental algorithm to update eigenvectors to handle temporal variations of background. The proposed algorithm is applied to 3-channel (RGB) and 4-channel (RGB+IR) data, and compared with standard subspace-based as well as pixel-wise density-based method.

1

Introduction

Background modeling and subtraction is an important preprocessing step for high-level computer vision tasks, and the design and implementation of a fast and robust algorithm is critical to the performance of the entire system. However, many existing algorithms involve significant amount of processing time by itself, so they may not be appropriate for real-time applications. We propose a fast algorithm to model and update background based on subspace method for multichannel data. The most popular approach for background modeling is pixel-wise densitybased method. In [3], background in each pixel is modeled by a Gaussian distribution, which has serious limitation that the density function is uni-modal and static. To handle multi-modal background, Gaussian mixture model is employed [4,5,6], but it is not flexible enough since the number of Gaussian components is either fixed or updated in an ad-hoc manner. For more accurate background modeling, adaptive Kernel Density Estimation (KDE) is proposed [7,8], but its huge computational cost and memory requirement remain as critical issues. To alleviate large memory requirement, a novel density estimation technique based G. Bebis et al. (Eds.): ISVC 2007, Part II, LNCS 4842, pp. 162–172, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Real-Time Subspace-Based Background Modeling Using Multi-channel Data

163

on mean-shift mode finding is introduced [9], but it still suffers from high computational cost. These methods are based on pixel-wise density estimation, so the computational cost is high especially when large images are involved. Also, background models are based on the data presented at the same pixel in the image and spatial information is usually ignored. Therefore, they sometimes lose accuracy or model unnecessarily complex background. On the other hand, Principal Component Analysis (PCA) is often used for background modeling [10,2]. In this framework, 2D images are vectorized and collected to obtain principal components during the training phase. In the test phase, vectorized images are first projected onto trained subspace and background is reconstructed using the projected image and the principal components. The advantage of this method lies in the simplicity; it is typically very fast compared with pixel-wise density-based methods and well fitted to realtime applications. However, it is not straightforward to integrate multi-channel data such as RGB color images without significant increase of dimensionality, which is not desirable due to additional computational complexity and curse of dimensionality problem. Recently, 2-Dimensional Principal Component Analysis (2DPCA) [1] is proposed for face recognition. The main advantage of 2DPCA is the dimensionality reduction; instead of vectorizing image data, it utilizes original two-dimensional structure of images. So, the curse of dimensionality problem is alleviated and spatial structures of visual features are considered. Most of all, the computational cost and memory requirement are reduced significantly, and this algorithm is more appropriate for real-time applications. We focus on the subspace-based background modeling algorithm using 2DPCA, and our contribution is summarized below. – We propose a background subtraction (BGS) algorithm for multi-channel data using 2DPCA with no additional dimensionality. – The computation of initial principal components in 2DPCA is performed efficiently by transformation to standard PCA, and incremental update of subspace in 2DPCA framework is employed to handle dynamic background. – The threshold for background subtraction is determined automatically by statistical analysis. – The proposed algorithm is naturally applied to the sensor fusion for background subtraction; in addition to RGB color images, IR image is integrated together to model background and detect shadow. We also implemented other background modeling and subtraction algorithms such as pixel-wise density based [6] and standard eigenbackground [2] method, and compare performance with our algorithm. This paper is organized as follows. In Section 2, 2DPCA algorithm is discussed and analyzed, and Section 3 describes background modeling technique with multi-channel data based on 2DPCA. The performance of our method is evaluated in Section 4.

164

2

B. Han and R. Jain

2DPCA

In this section, we review and analyze original 2DPCA algorithm introduced in [1,11]. 2.1

2DPCA: Review

The basic idea of 2DPCA is to project m × n image matrix A onto an ndimensional unitary column vector x to produce m-dimensional projected feature vector of image A by the following linear transformation. y = Ax.

(1)

Let the covariance matrix of the projected feature vectors be defined by Cx . The maximization of tr(Cx ), where tr(·) denotes the trace of a matrix, is the criterion to determine the optimal projection vector x. The formal derivation of tr(Cx ) is given by M 1  ¯ ) (yi − y ¯) tr(Cx ) = (y − y M i=1 i

=

M 1  ¯  (Ai x − Ax) ¯ (Ai x − Ax) M i=1

=

M 1   ¯  (A − A)x ¯ x (A − A) M i=1

= x CA x,

(2)

¯ are the means of projected feature vectors y , y , . . . , y and ¯ and A where y 1 2 M original image matrices A1 , A2 , . . . , AM , respectively, and CA is image covariance (scatter) matrix. Therefore, the maximization problem of tr(Cx ) is equivalent to solve for the eigenvectors of CA with the largest eigenvalues. When the k eigenvectors are selected, the original image A is represented with m × k feature vectors Y i:k , which is given by Y 1:k = AX1:k ,

(3)

where each column of the n×k matrix X1:k corresponds to one of the k principal components of CA . 2.2

Analysis of 2DPCA

Instead of vectorizing 2-dimensional image, 2DPCA uses the original image matrix; dimensionality and speed are significantly reduced. The data representation and reconstruction ability of 2DPCA is evaluated in face recognition problems [1,11,12], where 2DPCA is reported to be superior to standard PCA.

Real-Time Subspace-Based Background Modeling Using Multi-channel Data

(a) Training image

(b) Original test image

165

(c) Reconstructed background

Fig. 1. Example of poor background image reconstruction by 2DPCA

However, it is worthwhile to investigate further the computation procedure    of the image covariance matrix. Suppose that A i = [ai,1 ai,2 . . . ai,m ], where ¯ ai,j is the j-th row vector in image matrix Ai and that A = 0 without loss of generality. Then, image covariance matrix CA can be re-written as CA =

M 1   A Ai M i=1 i

M m 1   = a ai,j . M i=1 j=1 i,j

(4)

The Eq. (4) means that 2DPCA is equivalent to block-based PCA, in which each block corresponds to a row in the image. This is also discussed in [13]. The image covariance matrix is computed by the sum of outer-product of the same row vectors, and the information gathered from different rows are blindly combined into the matrix. This factor may not affect the image representation and reconstruction in face recognition much, but it causes a significant problem in background subtraction. Figure 1 demonstrates an example of poor background reconstruction by 2DPCA due to the reason described here.

3

Multi-channel Background Modeling by 2DPCA

In this section, we present how 2DPCA is applied to background modeling for multi-channel data without dimensionality increase. The method for efficient computation of initial principal components is introduced, and an adaptive technique to determine threshold for background subtraction is proposed. Also, the incremental subspace learning to handle dynamic background is discussed. 3.1

Data Representation and Initial Principal Components

As mentioned earlier, although 2DPCA shows successful results in face recognition, its 2D image representation has a critical limitation for background subtraction as illustrated in Figure 1. Therefore, we keep vectorized representation

166

B. Han and R. Jain

of image, but 2DPCA is utilized to deal with multi-channel image data efficiently. Note that the vectorization of entire d-channel image causes significant increase of dimensionality and additional computational complexity for initial and incremental PCA. Let Ai (i = 1, . . . , M ) be a d-channel m × n image, which means Ai is a m × n × d matrix. The original 3D matrix is converted to a d × mn matrix Ai by which data in the same channel is vectorized and located in a row. Typically, mn is a very large number and the direct computation of image covariance matrix is very expensive. Although eigenvectors can be derived from the image covariance of transposed matrix A i , only d eigenvectors are available in this case; a method to compute sufficient number of eigenvectors efficiently is required, which is described below.   Suppose that A = (A 1 A2 . . . AM ), where A is a mn×dM matrix. Then, CA in Eq. (4) is computed by a simple matrix operation instead of the summation of M matrices, which is given by CA =

1 AA . M

(5)

However, the size of CA is still mn × mn and prohibitively large. Fortunately, the eigenvectors of AA can be obtained from the eigenvectors of A A since A Aˆ u = λˆ u AA Aˆ u = λAˆ u u) = λ(Aˆ u) AA (Aˆ AA u = λu

(6)

ˆ and λ is eigenvector and eigenvalue of A A, respectively, and u = Aˆ where u u is eigenvector of AA1 . Therefore, we can obtain eigenvectors of AA by the eigen-decomposition of dM ×dM matrix instead of mn×mn matrix (mn  dM ). ˆ 1:k = (ˆ ˆ2 . . . u ˆ k ) the eigenvectors associated with the k largest u1 u Note that U eigenvalues of A A and that the eigenvectors of our interest are denoted by U1:k = (u1 u2 . . . uk ) that is an mn × k matrix. Also, the diagonal matrix for associated eigenvalues is Σk×k . 3.2

Background Subtraction

Denote by B = (b1 b2 . . . bmn ) a d × mn test image for background subtraction, where bi (i = 1, . . . , mn) is a d-dimensional data extracted from d channels ˆ mn ) from the ˆ2 . . . b ˆ 1b ˆ = (b in the i-th pixel. The reconstructed background B original image B is given by ˆ = BU1:k U . B 1:k 1

u needs to be normalized.

(7)

Real-Time Subspace-Based Background Modeling Using Multi-channel Data

167

Foreground pixel is detected by estimating the difference between the original and reconstructed image as  ˆ i || > ξ(i) 1 if ||bi − b FG(i) = , (8) 0 otherwise where FG(i) is a foreground mask and ξ(i) is a threshold for the i-th pixel in the vectorized image. Threshold is critical to the performance of background subtraction; we propose a simple method to determine the threshold of each pixel and update it based on temporal variations of incoming data. From the training sequence, the variation of each pixel is modeled by a Gaussian distribution, whose mean and variance are obtained from the distance between original and reconstructed image2 . Denote by m(i) and σ(i) be mean and standard deviation of the distance in the i-th pixel, respectively. The threshold ξ at the i-the pixel is given by ξ(i) = max (ξmin , min (ξmax , m(i) + κσ(i))) ,

(9)

where ξmin , ξmax and κ are constants. Also, the threshold is updated during background subtraction process; the new distance for the pixel is added to the current distribution incrementally. The incremental update of mean and variance is given by ˆ i || mnew (i) = (1 − α)m(i) + α||bi − b 2 σnew (i)

(10)

ˆ i || − m(i)) = (1 − α)σ (i) + α(1 − α)(||bi − b 2

2

(11)

where α is forgetting factor (learning rate). 3.3

Weighted Incremental 2DPCA

Background changes over time and new observations should be integrated into existing model. However, it is not straightforward to update background without the corruption affected by foreground regions and/or noises. In our work, the weighted mean of original and reconstructed image is used for incremental subspace learning, where the weight of a pixel is determined by its confidence to foreground (or background) classification. In other words, the actual data used for the incremental 2DPCA is given by ˜ i = (1 − r(i))b ˆ i + r(i)bi b (12)   ˆ 2 i −bi || where r(i) = exp −ρ ||bξ(i) and ρ is a constant. By this strategy, confi2 dent background information is quickly adapted but suspicious information is integrated into the model very slowly. 2

In case that training data are not reliable due to noises, foreground regions, etc., robust estimation techniques such as M -estimator and RANSAC may be required for the parameter estimation.

168

B. Han and R. Jain

There are several algorithms for Incremental PCA (IPCA) [14,15,16], among which we adopt the algorithm proposed [14]. Suppose that a collection of K new ˜  ). Then, the updated ˜...B ˜ B data, which is derived from Eq. (12), is B = (B 1 2 K   image mean (MA ) and covariance (CA ) matrix is given by (13) MA = (1 − β)MA + βM ˜ , B β BB  + β(1 − β)(MA − M ˜ )(MA − M ˜ ) , (14) CA = (1 − β)CA + B B M where CA ≈ U1:k Σk×k U1:k , MA and M ˜ are the mean of existing and new B data, and β is learning rate. The k largest eigenvalues of CA is obtained by the following decomposition, 

CA ≈ U1:k Σk×k U1:k ,

(15)

where U1:k is new mn × k eigenvector matrix and Σk×k is k × k diagonal matrix with associated eigenvalues. However, the direct computation of CA and its eigen-decomposition are not desirable (or even impossible) in real-time systems, and the eigenvectors and associated eigenvalues should be obtained from an indirect method, where CA is decomposed as CA = (U1:k |E)D(U1:k |E) ,  = (U1:k |E)R(k+l)×k Σk×k R (k+l)×k (U1:k |E) ,

(16)

where D = R(k+l)×k Σk×k R (k+l)×k , and E (mn × (k + l)) is a set of orthonormal basis vectors for the new data, which is orthogonal to the original eigenspace, and (MA − M ˜ ). Then, the original decomposition in Eq. (15) can be solved B by the decomposition of much smaller matrix D, which is given by D = (U1:k |E) CA (U1:k |E),

(17)

where CA in Eq. (14) is plugged in to obtain D efficiently. Note that we only need eigenvalues and eigenvectors from the previous decomposition in addition to incoming data for the subspace update. More details about incremental learning algorithm and its computational complexity are described in [14].

4

Experiments

In this section, we present the performance of our background subtraction technique in comparison with other algorithms — eigenbackground [2] and pixel-wise density-based modeling [6]. Our method is applied to RGB color image sequence and extended to sensor fusion — RGB color and IR — for background modeling and subtraction.

Real-Time Subspace-Based Background Modeling Using Multi-channel Data

(a) Original image (b) Eigenbackground

(c) GMM

169

(d) Our method

Fig. 2. Comparison of background subtraction algorithms (top) fountain sequence (bottom) subway sequence

(a) Reconstructed image

(b) ξ = 30

(c) ξ = 45

(d) ξ = 60

Fig. 3. BGS with fixed thresholds

4.1

RGB Color Image

We first test our background subtraction technique in fountain and subway sequence, where significant pixel-wise noises and/or structural variations of background are observed. Background model is trained using the first 50 empty frames, and only 5 principal components are selected. The subspace is updated at every time step by the method proposed in Section 3.3. The average processing speed is 8 ∼ 9 frame/sec in the standard PC with dual core 2.2GHz CPU and 2GB memory when the image size is 320 × 240 and the algorithm is implemented in Matlab. The standard eigenbackground [2] and pixel-wise Gaussian mixture model (GMM) with up to three components [6] are also implemented. The thresholds of the standard eigenbackground and GMM are determined manually to optimize background subtraction results for both sequences with constructed models. The threshold of our method is determined automatically, where the parameters in Eq. (9) are set as follows: ξmin = 30, ξmax = 60, and κ = 5. The results are illustrated in Figure 2, where we observe that the false alarm is lower than [2,6] with similar detection rate. Although [6] shows good performance when a separate optimal threshold is given to each sequence, it is difficult to find a good threshold for both sequences. Also, note that pixel-wise density-based modeling method is computationally more expensive than our method.

170

B. Han and R. Jain

(a) Sample training images

(b) BGS results at t = 277 (left) t = 2091 (right) Fig. 4. BGS with global illumination variations. Average intensities of sample training images are around 110, 111, 120, and 133, respectively.

To evaluate our automatic threshold setting technique, three different values — 30, 45, and 60 — are selected for global threshold ξ and applied to background subtraction for fountain sequence. The results for the same frame used in Figure 2 are illustrated in Figure 3; the threshold determined by the proposed method is clearly better than (a) and (b), and comparable to (c) considering true positives and false negatives. The variation of global illumination is successfully modeled by our method in campus sequence, where the scene is getting brighter gradually and there are some foreground objects in training images. Sample training images and background subtraction results are demonstrated in Figure 4. 4.2

Sensor Fusion for Background Subtraction

In addition to RGB color images, our method is naturally applied to background modeling and subtraction for the data captured by multiple sensors. Background subtraction based on 2DPCA using RGB and RGB+IR images are performed, and some results are illustrated in Figure 5. Since much larger images (640×480) are used in this experiment, the standard PCA for multi-channel data may not be a desirable solution due to high dimensionality. As shown in Figure 5, our method with RGB+IR shows much better performance than RGB only. The combination of IR with RGB is especially advantageous since there is no shadow for foreground objects in IR image and the shadow can be easily detected; the pixel with low reconstruction error in IR image but high overall error is considered to be shadow and classified as background. Also, at t = 150 and t = 350, moving vehicles behind trees are detected clearly in our method (inside ellipse), while modeling by RGB feature only hardly detect those objects.

Real-Time Subspace-Based Background Modeling Using Multi-channel Data

(a) t = 150

(b) t = 250

171

(c) t = 350

Fig. 5. 2DPCA BGS with RGB+IR data (row 1) RGB image (row 2) IR image (row 3) BGS with RGB (row 4) BGS with RGB+IR

5

Conclusion

We proposed a background modeling and subtraction algorithm for multichannel data using 2DPCA. Initial subspace for background model is obtained efficiently by converting 2DPCA problem into standard PCA, and the subspace is updated incrementally based on new data, which is obtained from the combination of incoming data and reconstructed background. We also proposed a method to automatically determine threshold for background subtraction. Our algorithm is implemented and compared with standard eigenbackground and pixel-wise density-based method, and applied to sensor fusion background subtraction.

172

B. Han and R. Jain

References 1. Yang, J., Zhang, D., Frangi, A.F., Yang, J.Y.: Two-dimensional pca: A new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Machine Intell. 26, 131–137 (2004) 2. Oliver, N.M., Rosario, B., Pentland, A.: A bayesian computer vision system for modeling human interactions. IEEE Trans. Pattern Anal. Machine Intell. 22, 831– 843 (2000) 3. Wren, C., Azarbayejani, A., Darrell, T., Pentland, A.: Pfinder: Real-time tracking of the human body. IEEE Trans. Pattern Anal. Machine Intell. 19, 780–785 (1997) 4. Friedman, N., Russell, S.: Image segmenation in video sequences: A probabilistic approach. In: Proc. Thirteenth Conf. Uncertainty in Artificial Intell (UAI) (1997) 5. Lee, D.: Effective gaussian mixture learning for video background subtraction. IEEE Trans. Pattern Anal. Machine Intell. 27, 827–832 (2005) 6. Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Fort Collins, CO, pp. 246–252 (1999) 7. Elgammal, A., Harwood, D., Davis, L.: Non-parametric model for background subtraction. In: Proc. European Conf. on Computer Vision, Dublin, Ireland, vol. II, pp. 751–767 (2000) 8. Elgammal, A., Duraiswami, R., Harwood, D., Davis, L.: Background and foreground modeling using nonparametric kernel density estimation for visual surveillance. Proceedings of IEEE 90, 1151–1163 (2002) 9. Han, B., Comaniciu, D., Davis, L.: Sequential kernel density approximation through mode propagation: Applications to background modeling. In: Asian Conference on Computer Vision, Jeju Island, Korea (2004) 10. Torre, F.D.L., Black, M.: A framework for robust subspace learning. Intl. J. of Computer Vision 54, 117–142 (2003) 11. Kong, H., Wang, L., Teoh, E.K., Li, X., Wang, J.G., Venkateswarlu, R.: Generalized 2d principal component analysis for face image representation and recognition. Neural Networks: Special Issue 5–6, 585–594 (2005) 12. Xu, A., Jin, X., Jiang, Y., Guo, P.: Complete two-dimensional pca for face recognition. In: Int. Conf. Pattern Recognition, Hong Kong, pp. 459–466 (2006) 13. Wang, L., Wang, X., Zhang, X., Feng, J.: The equivalence of two-dimensional pca to line-based pca. Pattern Recognition Letters 26, 57–60 (2005) 14. Hall, P., Marshall, D., Martin, R.: Merging and splitting eigenspace models. IEEE Trans. Pattern Anal. Machine Intell. 22, 1042–1048 (2000) 15. Weng, J., Zhang, Y., Hwang, W.: Candid covariance-free incremental principal component analysis. IEEE Trans. Pattern Anal. Machine Intell. 25, 1034–1040 (2003) 16. Levy, A., Lindenbaum, M.: Sequential karhunen-loeve basis extraction and its application to images. IEEE Trans. Image Process. 9, 1371–1374 (2000)

Suggest Documents