MULTL-RESOLUTION BACKGROUND SUBTRACTION FOR DYNAMIC SCENES BinengZhong1,ShaohuiLiu1,HongxunYao1,BaochangZhang2
1Department of Computer Science and Engineering, Harbin Institute of Technology 2School of Automation Science and Electrical Engineering, Beihang University, China {bnzhong, yhx}@vilab.hit.edu.cn;
[email protected];
[email protected] ABSTRACT Dynamic scenes (e.g. waving trees, ripples in water, illumination changes, camera jitters etc.) challenge many traditional background subtraction methods. In this paper, we present a novel background subtraction approach for dynamic scenes, in which the background is modeled in a multi-resolution framework. First, for each level of the pyramid, we run an independent mixture of Gaussians Models (GMM) that outputs a background subtraction map. Second, these background subtraction maps are combined via AND operator to finally get a more robust and accurate background subtraction map. This is a natural fusion because the original resolution and low resolution images have complementary strengths, which original resolution image contains rich information and low resolution image is insensitive to the noises and the small movement of dynamic scene. Experimental result shows that this real-time algorithm is able to detect moving objects accurately even in dynamic scenes. Index Terms— Background Subtraction, Dynamic Scenes, Multi- resolution Analysis 1.INTRODUCTION Background subtraction (BGS) is one of the essential tasks in intelligence video surveillance, in which an adaptive statistical background model is first constructed, and then incoming pixels that are unlikely to be generated by this model are labeled as foreground. Even though a large number of BGS methods have been proposed over the years, the task remains challenging in dynamic scenes, such as waving trees, ripples in water, illumination changes, camera jitters etc. Many early BGS methods [1] model each pixel in a video frame with a Gaussian distribution. However, single Gaussian model only works well in the case of static scenes. By using more than one Gaussian distribution per pixel, mixture of Gaussians Models (GMM) [2] is one of widely used approaches to deal with periodic motions from a cluttered background, slow lighting changes, etc. However, it cannot adapt to the quick variations in dynamic environments [3]. Numerous improvements of the original
978-1-4244-5654-3/09/$26.00 ©2009 IEEE
3193
method developed by Stauffer and Grimson [2] have been proposed over the recent years and a good survey of these improvements is presented in [4]. Rather than extending the GMM, other pixel-wise modeling approaches include kernel density estimation [5, 6] and code-book [7], etc. Most of the above BGS methods assume that the time series of observations is independent on each pixel, which is a strict assumption and thus leads to generate many false alarm pixels of detected foreground regions in dynamic environments. Some methods firstly use BGS to get a set of candidate foreground pixels; then, use a post processing scheme to remove false alarm pixels of detected foreground regions. In [3, 8], a 3-stage algorithm is separately presented, which operates respectively at pixel, region and frame level. After modeling the background with GMM, Tian et al. [9] integrates intensity and texture information to remove false alarm pixels due to shadows and quick lighting changes. However, when a foreground pixel is not detected by pixel level subtraction due to similar color, those methods will not classify this pixel as foreground. Patwardhan et al. [10] coarsely represent a scene as the union of pixel layers and detect foreground objects by propagating these layers using a maximum-likelihood assignment. However, the limitations of the method are the requirement of an extra offline training step and high-computational complexity. Other BGS methods include motion-based approached [11], region-based algorithms [12, 13] and methods using edge features [13, 14] and so on. Please refer to [15] for a more complete BGS methods review. Inspired by the multi-resolution images’ properties that original resolution image contains rich information and low resolution image is insensitive to the noises and the small movement of dynamic scene, we present a novel multi-resolution BGS method for dynamic scenes in this paper. For an incoming frame, we first use its original resolution and low resolution images as input images to their corresponding GMM-based BGS modules that output BGS maps. Then, we fuse these BGS maps via AND operator to finally get a more robust and accurate BGS map. Our method has the following advantages: 1) Since our method effectively utilize the complementary characteristics at multiple resolutions, our method not only can tolerate quick variations in dynamic scenes, but also can maintain detailed information in original resolution image. 2) The
ICIP 2009
proposed fusion method is simple and fast, which is different from some existing multi-level works using a complicated post processing scheme to remove false alarm pixels of detected foreground regions. 3) Qualitative and quantitative experimental results show that this real-time algorithm is able to detect moving objects accurately even in dynamic scenes. The rest of the paper is organized as follows: Section 2 describes the proposed method in details. In section 3, experimental results are given. Finally, conclusion and future work are drawn in Section 4. 2.THEPROPOSEDMETHOD 2.1.MultiͲresolutionImagesandTheirProperties Representing images at multi-resolution level is a very popular technique to capture features at multiple scales. The low resolution images reserve the low frequent information from the high resolution images while the high frequent information is filtered. In our method, the low resolution images are obtained by downsampling the original images. The motivation to use multi-resolution images is that the peak noises and the small movements of the dynamic scenes are eliminated in the low resolution images, which lead to a better and more stable background modeling. To verify our motivation, we check some pixels’ intensity values over time in dynamic scenes and give a typical example in Fig.1. Specifically, we examine of an intensity value of a dynamic pixel A over time, for original resolution and low resolution image respectively in Fig.1. It is obviously to see that the fluctuation of the intensity value distribution of pixel A in the original resolution image is large, which is in consistent with the fact that pixel A is frequently occupied by waving trees in the original image. In this case, the background model of pixel A is difficult to build due to the large temporal variability of the background. However, the intensity value distribution of pixel A in low resolution image is fairly stable, which leads to a more stable background modeling. 2.2. Mixture of Gaussians Models (GMM) for MultiͲresolutionImages In this paper, we incorporate the popular GMM [2] method with our multi-resolution images. The GMM method can deal with periodic motions from a cluttered background, slow lighting changes, etc. However, it cannot adapt to the quick variations in dynamic environments, such as tree leaves swaying, water rippling, and camera jitter. In other words, the GMM method generates large number of false foreground pixels under those difficulty conditions (see Fig.
3194
3(a)). In the GMM, three significant parameters ୮ , ୮ and Ƚ୮ are needed to be set, where ୮ is the number of Gaussian components, ୮ the minimum portion of the background model and Ƚ୮ the learning rate. In our implementation, both original resolution-based background modeling and low resolution-based background modeling is done in terms of RGB color. For the low resolution-based background modeling, we set ܭ ൌ ͵ (three Gaussians), ܶ ൌ ͲǤ and ߙ ൌ ͲǤͲͳ. For the original resolution-based background modeling, we set ୮ ൌ Ͷ (four Gaussians), ୮ ൌ ͲǤͺ and Ƚ୮ ൌ ͲǤͲͳ. For the original resolution-based background modeling, values for the parameters ܶ and ୮ are changed to adjust the method for increased multimodality of the background. Please refer to [2] for more details about the GMM method.
Fig.1. Evolving curves of the intensity value of the pixel A over time, for the original resolution and low resolution image separately.
2.3.MultiͲresolutionBGS We run the GMM in a multi-resolution framework. This enables our method to effectively utilize the complementary characteristics at multiple resolutions. First, for each level of the pyramid, we run an independent GMM that outputs a BGS map. Second, the two maps are combined via AND operator to finally get a more robust and accurate BGS map. Specifically, for an incoming frame, we first obtain its low resolution image by downsampling the original image. We then get the two BGS maps from the two resolution images via GMM. Finally, the map of low resolution is resized to the size of the original resolution image, and fused by pixel-wise & operator with the map of original resolution image to form a final BGS map. Fig.2 shows a typical flowchart of our algorithm. In our implement, we run the GMM for the original and half-size images.
Fig.2. Overview of our method which combines several BGS maps associate to different image pyramid in ordder to finally get a more robust and accurate BGS results.
GMM Our Method
3.EXPERIMENTSANDDISCUSSION
(a)
3195
500
12000
12000
10000
300
200
10000
FN + FP per Frame
FP per Frame
400
FN per Frame
The proposed method is implemented using C++, on a computer with Intel-Core 2 1.86 GHz processor. It achieves the processing speed of 15 fps at the resolution of ͵ʹͲ ൈ ʹͶͲ pixels. We compare the performance of our method to the widely used method of GMM [2]. Both qualitative and quantitative comparisons are used to evaluate our approach. The quantitative comparison is done in terms of the number of false negatives (the number of foreground pixels that are missed) and false positives (the number of background pixels that are marked as foreground). In Fig. 3(a), we show the results of our method using four test sequences. The sequences used in the experiment include waving trees, illumination changes, ripples in water and moderately camera jitter. The frames on the first two columns are from outdoor sequences which contain a moving person in foreground, with dynamic background composed of subtle illumination variations in the sky along with swaying trees. Our method robustly handles these situations and the moving object is detected correctly because it exploits multi-resolution properties of background images. The first two sequences have been taken from [8] and [16] respectively. The frames on the third column are from [6] which contain average camera jitter of about 14.66 pixels. Our method also gives good results. The last frames on the fourth column are from internet. The sequence contains a moving foreground, with dynamic background composed of ripples in the water with shadows. Our method does well under this condition.
8000
6000
4000 100
2000
0 2
3
4
Test Sequence
6000
4000
2000
0 1
8000
0 1
2
3
4
Test Sequence
1
2
3
4
Test Sequence
(b) Fig.3. Comparision results of GMM and our method. a) is the original test sequences and some detection results of the GMM and our method. b) is the test results. FN and FP stand for false negatives and false positives, respectively.
In order to provide a quantitative perspective about the quality of foreground detection with our approach, we manually mark the foreground regions in all frames from each sequence to generate ground truth data, and make comparison with GMM. We sum the error from the frames corresponding to the ground truth frames. The numbers of error classifications are achieved by the average error per frame. The corresponding quantitative comparison is reported in Fig. 3(b). For all sequences, the proposed method achieves best performance in terms of false positives and false negatives. Since our method is obtained by combining an original resolution-based background model with a low resolution-based background model in an effectively fusing manner, it is robust against dynamic background. It should be noticed that, for the proposed method, most of the false negatives occur on the contour areas of the foreground objects (see Fig.3 (a)). This is because the proposed method utilizes multi-resolution properties of background images. According to the overall results, the proposed method outperforms the GMM for the used test sequences.
5.ACKNOWLEDGEMENTS This work is supported by National Basic Research Program of China (2009CB320906), National Natural Science Foundation of China (60775024) and Specialized Research Fund for the Doctoral Program of Higher Education of China (20060213052). 6.REFERENCES
Fig.4. Connected component-based comparision results of GMM and our method. Red bounding boxes stand for connected components.
Above comparison methods used in most work in the background modeling literature measure the accuracy of the algorithm at the pixel level. However, they don’t capture precisely its ability to give reasonable detections of foreground objects for higher level applications, such as object classification, object tracking and object recognition. To better show the performance of our method, we also give some connected components of detection results. If more than 50% of a foreground object is detected as a single connected component, then it is counted as a true positive. Otherwise, the foreground object is considered a miss, and counted as a false negative. The remaining connected components are counted as false positives. According to the comparison results shown in Fig.4, it is obviously to see that our method can give more reasonable detections of foreground objects for higher level applications. 4.CONCLUSIONANDFUTUREWORK In this paper, we present a multi-resolution background subtraction method for dynamic scenes. Our study shows that the multi-resolution properties of background images not only can tolerate quick variations in dynamic scenes, but also can maintain detailed information in original resolution image. By combining a original resolution-based with a low resolution-based background model via AND operator, our method can precisely describe background changes and tolerate variations in natural scenes, such as tree leaves swaying, water rippling, and camera jitter. The proposed method has achieved lower false positives, false negatives and more reasonable detections of foreground objects for higher level application, comparing to the widely used GMM. Our future work will focus on how to cooperate our method with other features, such as texture and geometry to cope with cast shadow, which is an extremely difficult problem in background subtraction.
3196
[1] C.R. Wren, A. Azarbayejani, T. Darrell, and A.P. Pentland, Pfinder:RealͲTimeTrackingoftheHumanBody,TPAMI,vol.19,no. 7,pp.780Ͳ785,July1997. [2] C. Stauffer and W.E.L. Grimson, Learning Patterns of Activity Using RealͲTime Tracking, TPAMI, vol. 22, no. 8, pp. 747Ͳ757, August2000. [3]O.Javed,K.Shafique,andM.Shah,AHierarchicalApproachto Robust Background Subtraction using Color and Gradient Information, IEEE Workshop on Motion and Video Computing, pp.22Ͳ27,2002. [4]T.Bouwmans,F.ElBaf,B.Vachon,BackgroundModelingusing MixtureofGaussiansforForegroundDetectionͲASurvey,Recent PatentsonComputerScience1,3(2008)219Ͳ237. [5]A.Elgammal,D.Harwood,andL.Davis,NonͲparametricModel forBackgroundSubtraction,ECCV,vol.2,pp.751Ͳ767,June2000. [6]Y.SheikhandM.Shah,BayesianModelingofDynamicScenes forObjectDetection,TPAMI,vol.27,no.11,pp.1778Ͳ1792, November2005. [7] K. Kim, T.H. Chalidabhongse, D. Harwood and L. Davis, RealͲtime ForegroundͲBackground Segmentation using Codebook Model,RealͲTimeImaging,vol.11,issue3,pp.167Ͳ256,June2005. [8] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, Wallflower: Principles and Practice of Background Maintenance, ICCV, vol.1, pp.255Ͳ261,1999. [9]Y.L.Tian,M.Lu,A.Hampapur,Robustandefficientforeground analysis for realͲtime video surveillance, CVPR, vol.1, pp.1182Ͳ1187,June2005. [10] K.A. Patwardhan, G. Sapiro and V. Morellas, Robust ForegroundDetectioninVideoUsingPixelLayers,TPAMI,vol.30, no.4,pp.746Ͳ751,April2008. [11] L. Wixson, Detecting Salient Motion by Accumulating Directionally Consistent Flow, IEEE Trans. Pattern Analysis and MachineIntelligence,vol.22,no.8,pp.774Ͳ780,Aug.2000. [12]T.Matsuyama,T.Ohya,andH.Habe,BackgroundSubtraction forNonStationaryScenes,Proc.AsianConf.ComputerVision,pp. 622Ͳ667,2000. [13]M.MasonandZ.Duric,UsingHistogramstoDetectandTrack ObjectsinColorVideo,Proc.AppliedImageryPatternRecognition Workshop,pp.154Ͳ159,2001. [14]S.Jabri,Z.Duric,H.Wechsler,andA.Rosenfeld,Detectionand LocationofPeopleinVideoImagesUsingAdaptiveFusionofColor andEdgeInformation,Proc.Int’lConf.PatternRecognition,vol.4, pp.627Ͳ630,2000. [15] M.Piccardi, Background subtraction techniques: a review, SMC(4),pp.3099Ͳ3104,2004. [16]http://mmc36.informatik.uniͲaugsburg.de/VSSN06_OSAC/#tes tvideo.