2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE 2012), December 3-4, 2012, Kota Kinabalu Malaysia
Moving Object Extraction in PTZ Camera using the Integration of Background Subtraction and Local Histogram Processing Syaimaa’ Solehah¹, Shahrul Nizam Yaakob²
Zulaikha Kadim³, Hon Hock Woon4
School of Computer and Communication Engineering University Malaysia Perlis Malaysia ¹
[email protected], ²
[email protected]
Centre For Intelligent Imaging MIMOS bhd. Malaysia {³zulaikha.kadim, 4hockwoon.hon}@mimos.my
highly dependent on the objects visual homogeneity and speeds. The most common approach is based on background subtraction which is build up a background model and subtract current image frame from the modeled background to extract moving objects.
Abstract— This paper proposes a technique of extracting moving object using a pan-tilt-zoom (PTZ) camera. The technique is based on the integration of background subtraction and local histogram processing. Background images are modeled as multiple images and their corresponding camera pose information (pan and tilt angles). To detect object in current image, first the system will determine the most matched background based on the acquired PTZ’s pose information. Then the matched background is compensated with respect to current image, whereby the background subtraction is done between the two aligned images. The resultant output is then cleaned up using morphological opening operator, before local histogram processing is done and finally followed by closing operation. The results show the moving object can be successfully extracted after go through these four steps.
The structure of this paper is as follows. In the next section, previous research works related to this paper is discussed, followed by the description of our proposed approach. The experimental setup and results are then presented, before finally the conclusion to this paper is drawn II.
Various techniques exist in literature for object extraction. Some use improved version of existing method and some use combination of two or more methods. Ninad Thakoor and Jean Gao, [1], used a region boundary based change detection approach for frame difference method. The technique is used for extract multiple objects captured by a moving stereo camera. A displaced frame difference, DFD, getting from input frame and compensated of second input frame are compared to a threshold to detect approximate shape of moving object. For fast moving object, they used two consecutive DFDs to overcome a problem of having a combination of object shapes in both frames. A region boundary based change detection is proposed to solve a problem which occur when the object segment overlap with itself, since one frame is segmented into a few regions based on color information. Using edge map difference gives a single connected region and some post-processing is necessary. Moving region can be determined by the ratio of length of moving region boundary to region perimeter. Moving object is extracted from the region by masking the boundary and moving region. The weakness in this approach is it does not detect occlusion.
Keywords- Motion detection, object extraction, background subtraction, local histogram, PTZ camera
I.
INTRODUCTION
Detection and extraction of moving object is an important step for many applications including video surveillance, traffic monitoring, human tracking and other applications. There are three common approaches to detect moving objects, which are optical flow, temporal difference and background subtraction. Technique for extracting moving object is a bit different when using a PTZ camera as compared to static camera. The PTZ camera has zoom and pan control and it can rotate 360 degrees on its axis. The background of each frames are also different in term of position and location, while using a static camera, the background of each frames are same. Approach based on optical flow can successfully extracts moving objects from a sequence of images captured by static cameras and also moving cameras. However, each of the motion due to objects and cameras movement from two successive frames must be small according to spatio-temporal. Moreover, high computational cost makes this method not suitable for real time applications and problems arise when images are not continuous but contain distinct foreground objects with crisp edges. Approach 2, which is based on temporal difference, subtracts two consecutive frames and apply threshold to the output. These two methods are simple, but the results are
978-1-4673-3033-6/10/$26.00 ©2012 IEEE
RELATED WORKS
Guofeng Zhang, et a.l, [2], solved problem in term of accuracy, when the camera undergoes arbitrary translational and rotational motions and the background has complex structures. Two main factor of this problem to be occurred are
167
2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE 2012), December 3-4, 2012, Kota Kinabalu Malaysia
motion estimation and foreground definitionn. They proposed a method which iterates between motion estim mation and bilayer segmentation. They used method based onn appearance and structure consistency constraint in 3D warpiing. The pixels are projected in one frame to their 3D locations and re-projected 3D points onto other frames. Two neighborinng frames are then warped. The pixels, which have high residuual error has high chance to be in the moving foreground. Forr each pixels, they apply median filter to find local best matcch defined on the depth map. Finally, they do binary segmenntation and border matting to refine the binary foreground mapss.
III.
OSED APPROACH PROPO
Given a multiple of backgground images representing the scenes as viewed by a PTZ camera, for any given current image frame, ft of dimensionn m by n number of pixels, a motion map of the same dimension d will be generated, assuming that the camera intrrinsic parameter is not known. Motion map is a binary map which w each of the pixels indicate whether its corresponding pixeel in current frame is belong to the background or moving foreeground. From this motion map, bounding box enclosing each of the connected motion pixels in the map can be computed to indicate the location of the moving object in current framee.
Zhan Chaohui, et al., [3], proposed an improved method based on frame difference and edge detection. It detects the edges of each two continuous frames by Canny detector and gets the difference between two edge imagees. Then it divides the edge difference images into several small blocks and decides if they are moving areas by comparring the number of non-zero pixels to a threshold. Lastly it does the blockconnected component labeling to get smalllest rectangle that contains the moving object. The result hass high recognition rate and high detection speed, compared to standard method of frame difference, moving edge and backgrouund subtraction.
Our proposed method of moving m object extraction using PTZ camera is an integratiion of pixel-wise background subtraction and histogram-based foreground/background i is assumed that the PTZ will classification. For this work, it capture a series of backgroundd images representing the scene. These background images will be stored together with the PTZ camera pose (e.g. pan (Į) andd tilt (ȕ) angles of the camera). Given this series of backgrround images, the process of extracting moving object is started with finding the best matched of background imagee with respect to current image frame. It is followed by estim mating camera motion between most matched background andd current image frame. Then the computed transformation matrrix will be used to compensate the background to align it to current c image frame. Then both images will be processed to finnally extract the moving object. The overall proposed system iss illustrated in Fig. 1. Details of each step are as follows.
Mandar Kulkarni, et al., [4] used a sttatic camera as a source. The background frame is segmentted, in which the number of segmented regions is equal too the number of Gaussians fitted to histograms. Compute the difference in every pixel’s intensity with the corresponnding pixel in the mean image. This difference is then compareed to the threshold to define foreground object. The threshold is equal to three times the average of the standard deviation of o all the Gaussian components in the model. This paper is alsoo incorporate with optical flow to overcome a problem of detecting moving background object as a foreground. This is a case when motion of background pixels is less than thatt of the foreground pixels. Points having velocity less than a veloocity threshold are classified as background points and set to t zero. Methods proposed by Mandar Kulkarni can work foor both single and multiple moving objects.
Input 1 : Background images. Input 2 : Current Image. Finding the most maatched background. Output : Matchedd background, Bi
Input : Matched d background, Bi Camera motion estimation Output : Homog graphy matrix, Ht
In [5], they extract and track feature points p through the sequence and select trajectories of backgground points by exploiting geometric constraints based on the affine camera model. It’s used to generate panaromic background and compare it with the individual frames. The difference d of these two frames is thresholded to extract moving objects.
Input : Homogrraphy matrix, Ht Camera motion compensated Output : Compensateed background, Bcomp
From the previous research, problems can c be identified. Some papers include a lot of methods. In thhis paper, we will use simple method to get the best output, whhich is an extracted moving object. Next section explains techhnically about our proposed method, which will combine the method of background subtraction and local histograam processing to further refine the output from background suubtraction.
Input : Compensated background, Bcomp Moving objeect extraction Output : Mottion map, M
Figure 1. Overalll proposed system
168
2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE 2012), December 3-4, 2012, Kota Kinabalu Malaysia
Where ܤ is the compensated or aligned background, Bi is the matched background and Ht is the homography between background and current image.
A. Finding the best matched background image. Given n numbers of background images, Bi, i={1,2,...,n}, and it’s corresponding pan and tilt angles, (Įi,ȕi), i={1,2,...,n}, the output of this step is to find Bi which has the most overlapped areas with current image frame. In this work, the camera pose information will be used to choose the most matched background. Let say, (Įt,ȕt) denotes the pan and tilt angle of current image, then most matched background is chosen such as the distance between current camera pose and background image i is minimum. Distance between these angle can be computed using equiledian distance as in (1). ࢊ࢙࢚൫ሺࢻ࢚ ǡ ࢼ࢚ ሻǡ ሺࢻ ǡ ࢼ ሻ൯ ൌ ඥሺࢻ࢚ െ ࢻ ሻ ሺࢼ࢚ െ ࢼ ሻ
(a)
(1)
B. Camera Motion Estimation In this step, camera motion between background and current image will be computed. The output is a homography matrix which defines the transformation between both images. Steps to estimate camera motion is illustrated in Fig. 2. First, keypoints in both background and current frame are extracted and their descriptors are computed. In this work SURF descriptor is used as they are providing more robust against different image transformations than SIFT and is several times faster than SIFT [6]. As a short description, SURF is a robust image detector and descriptor. For the details of SURF, readers may refer to [7] and [8]. Then the extracted keypoints in background will be matched with keypoints in current image. From the matched keypoints, homography, Ht can be computed. Input 1 : Matched Background, Bi
(c) Figure 3. Sample of compensated background image; (a) current image; (b) matched background image; (c) compensated background
D. Moving Object Extraction The output from this stage is the motion map indicating pixels which corresponds to moving object in current frame. The process flow for this step is illustrated in Fig. 4. First, the aligned background frame is compared to the current frame by background subtraction method. The background subtraction is done by subtracting pixels intensity values, at coordinate [ii,ij], of current frame to the pixels in aligned background frame, at the same coordinate. The difference is compared to a threshold value. If the difference is more than the threshold, the pixel is assumed to be a motion pixel and otherwise. The output of background subtraction contains noise, and one of the reasons is due to the motion estimation error. Noise is the background pixel which is falsely identified as motion pixel, as illustrated in Fig. 5. The higher the error, the output from background subtraction will contain more noise. To remove small and separated noises from the actual motion area and filling small holes within motion area, morphological closing is applied [9]. Closing operator can be implemented using dilation and erosion techniques. Dilation is a process of adding pixels to the edge of motion area. Motion area becomes thicker and small holes within motion area are filled. The process is followed by erosion, which removes pixels from the edge of motion area. By applying morphology operators, small and separated noises can be removed, however the connected noise pixel is not possible to be removed by using this process. To overcome this problem, we compute local histogram for each pixel which classified as motion pixels. In this case, we use size of 3x3. The local histogram of both current and warped images is compared with a threshold to re-classify the pixels, refer to Fig. 6.
Input 2 : Current Image
Extract SURF keypoints
Extract SURF keypoints
Compute SURF
Compute SURF
Find keypoints matching
Compute homography Output : Homography, Ht Figure 2. Steps in estimating camera motion
C. Camera Motion Compensation The output to this step is the aligned background with respect to current image frame to prepare the background image for background subtraction. The process is done by applying Ht to the background image as in (2). Fig. 3 shows a sample of background compensated image. ܤ ൌ ܪ௧ ܤ כ
(b)
(2)
169
2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE 2012), December 3-4, 2012, Kota Kinabalu Malaysia
Bcomp Input 1 : Compensated Background, Bcomp
Compute histogram similarity, sim(histt(i,j), histBcomp(i,j)) as in Eq. 3.
Input 2 : Current Image
If sim > threshold, set Mt(i,j) to background
Bakcground Subtraction
End if End if
First Image Filtering using Morphology Operators
End for Figure 7. Pseudo code for pixel classification using local histogram comparison
Local Histogram Processing
The pseudo code of pixel classification based on local histogram comparison is shown in Fig. 7. Local histogram similarity measurement is based on Bhattacharyya Distance as in (3). Sample result from each of the steps in extracting moving object is given in Fig. 8 to 11.
Second Image Filtering using Morphology Operators
Output : Motion Map, M
݉݅ݏሺܣǡ ܤሻ ൌ σିଵ ୀ ඥܣ ܤ כ Figure 4. Process flow for moving object extraction
Where A and B are histogram with n bins. IV.
Connected noise
EXPERIMENTAL RESULTS
A. Datasets Dataset, a sequential image of indoor scene, consists of a background image as a reference and some frames which contain moving objects. When using single background image, we set the camera motion to a low speed. We can use higher speed of camera motion when using a panaromic background, which has multiple background frames. Fig. 8 shows motion map of dataset 1, from each of four steps in the proposed method. After final stage, Fig.8(d), we get only moving objects and noises are removed.
Separated noise (a)
(3)
(b)
Figure 5. Motion map from background subtraction step; (a) current image; (b) resultant motion map. Black pixels are background pixels, whereas white pixels are motion pixel.
(a)
(b)
(a) (b) Figure 6. Illustration of local histogram processing; (a) part of motion map; (b) corresponding part of current image
Given: Compensated background image, Bcomp, current image, It and it’s corresponding motion map, Mt,
(c) (d) Figure 8. Motion map from; (a) background subtraction (step 1); (b) first image filtering (step 2); (c) local histogram processing (step 3); (d) second image filtering (step 4).
For all pixels (i,j) in Mt, If Mt(i,j) is a motion pixel, Compute local intensity histogram, histt(i,j) around pixel (i,j) in It Compute local intensity histogram, histBcomp(i,j) around pixel (i,j) in
170
2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE 2012), December 3-4, 2012, Kota Kinabalu Malaysia
Fig. 9 shows a final result (after step 4) of extracting moving object using dataset 2. The moving object has been successfully extracted from the background.
Figure 10. Current frame and final motion map of sequential images in dark scene. (a)
(c)
Our proposed method can extract moving object successfully. In some case, there is difficulty in extracting moving object which has similar color with some part of background. Difference of pixel intensities between moving object and background are less. The motion map of this dataset can be seen in Fig. 11. From the results, the system still detect moving object in every frame, since there is some part of moving object which has different color with the background.
(b)
(d)
Figure 9. Motion map after four steps of the proposed method in; (a) frame 1; (b) frame 2; (c) frame 3; (d) frame 4.
The system has been tested using dataset 3, which has dark scene. The moving object can be extracted very well in every frame. Refer Fig. 10.
171
2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE 2012), December 3-4, 2012, Kota Kinabalu Malaysia
Figure 11. Current frame and final motion map of sequential images in which the moving object has similar color with some part of background.
V.
This paper proposed a method for extracting moving object using a moving camera (PTZ camera) when background information is available. The proposed approach is combining the background subtraction between compensated background image and current image, and the result is further refined using a local histogram processing. From the results (as we can see in Fig. 9, 10, 11), moving object can be extracted in every frame and by combining both processes, the accuracy of segmentation is better as compared to only applying any one of them. The advantages of this method are its excellent performance and the difference in processing time between typical background subtraction and proposed method is less than 0.5sec even though this method has added three more steps after background subtraction. There are many applications which need this method and system, such as surveillance system, counting people in certain area and road traffic. Future work for this project is to further improve the results especially the motion/ background pixels delineation at the boundary of moving object.
B. Evaluation Method The performances of this method are evaluated by calculating the accuracy of the results, which we can get from Eq. 4. The accuracy of motion map is compared to the ground truth which can be obtained manually. In preparing the ground truth, object’s boundary pixels are set as omitted pixels. These pixels are not considered during accuracy calculation.
ݕܿܽݎݑܿܿܣൌ
்ା்ே ்ା்ேାிାிே
ͲͲͳ כΨ
(4)
In which, TP (True Positive) – actual is motion pixel, detected as motion pixel TN (True Negative) – actual is background pixel, detected as background pixel FP (False Positive) – actual is background pixel, detected as motion pixel FN (False Negative) – actual is motion pixel, detected as background pixel
REFERENCE [1]
C. Result Presentation Performance of moving object extraction method can be represented in term of accuracy, using Eq. 4. The calculation of accuracy is based on average of the frames in the sequence. Refer to Table I.
[2] [3]
TABLE I. ACCURACY OF BACKGROUND SUBTRACTION (BS) AND INTEGRATION OF BOTH BACKGROUND SUBTRACTION AND LOCAL HISTOGRAM PROCESSING WITH IMAGE FILTERING (BS+LHP+IF)
Dataset Dataset 1 Dataset 2 Dataset 3 Dataset 4
[4]
Typical Background Subtraction, BS 99.29 86.30 96.00 90.35
CONCLUSION
Proposed Method (BS + Local Histogram+ Filtering) Accuracy (%) 99.91 100.00 98.56 94.01
[5] [6] [7] [8] [9]
172
N. Thakoor, J. Gao, “Automatic extraction and localization of multiple moving objects with stereo camera in motion,“ IEEE International Conference on Systems, Man and Cybernetics (SMC), 2005, in press. G. Zhang, et al., “Moving object extraction with a hand-held camera,“ IEEE 11th International Conference Computer Vision, ICCV, 2007, in press. Z. Chaohui, D.Xiaohui, X.Shuoyu, S.Zheng, L.Min, “An improved moving object detection algorithm based on frame difference and edge detection,“ 4th International Conference on Image and Graphics, ICIG 2007, in press. M. Kulkarni, “Histogram-based foreground object extraction for indoor and outdoor scenes”, 7th Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 2010, in press. S. Yasuyuki, K. Kenichi, “Extracting Moving Objects From a Moving Camera Video Sequence“, 10th Symposium on Sensing via Image Information, Japan, 2004, pp. 279-284. L. Juan, O.Gwun, “A Comparison of SIFT, PCA-SIFT and SURF”, International Journal of Image Processing (IJIP), 2009, vol. 3, pp. 143– 152. http://opencv.willowgarage.com/documentation/cpp/feature_detection. html H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, SURF: Speeded Up Robust Features, Computer Vision and Image Understanding (CVIU), vol. 110, No. 3, pp. 346--359, 2008 Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing , 3rd ed. 2010, pp. 649-686.