2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation
Extraction of Moving Objects Using Frame Differencing, Ghost and Shadow Removal Syaimaa’ Solehah Mohd Radzi1, Shahrul Nizam Yaakob2
Zulaikha Kadim3, Hon Hock Woon4 Centre of Intelligent Imaging Mimos Bhd. Bukit Jalil, Kuala Lumpur {³zulaikha.kadim, 4hockwoon.hon}@mimos.my
School of Computer and Communication Engineering University Malaysia Perlis Perlis, Malaysia ¹
[email protected], ²
[email protected] Abstract—This paper proposes a technique for extracting
There may be holes in the detected object. Nowadays, there are some improved methods of temporal differencing [3]. Method optical flow-based uses characteristics of flow vectors of moving objects over time to detect moving regions in an image sequence. Optical-flow-based methods can be used to detect independently moving objects even in the presence of camera motion. However, most flow computation methods are computationally complex and very sensitive to noise, and cannot be applied to video streams in real time without specialized hardware [4]. While using a PTZ camera, background subtraction is not an effective method because it needs to capture a lot of background frames. It takes a longer time. In this paper, we are using temporal differencing because we only need one or more previous frames to be the reference image. But it also detects moving object of the previous image. The object is called ‘ghost’. Subtraction of pixel intensity between two previous images ( and ). The resultant motion map is then compared to the first motion map. The goals of this research are to extract moving object in dynamic scene and overcome misalignment problem, shadow and other noises appearance. Technique used to remove the ghost will be explained in section Proposed Approach. The motion segmentation process produces some noises, formed by misaligned image, illumination changes, shadow, ghost and etc. Other than morphological operation, technique of local histogram processing can be used to detect noises (remove noises) [5][6]. Normalized cross correlation, NCC, has been used extensively for many machine vision applications, but the traditional normalized correlation operation does not meet speed requirements for time-critical applications [7]. Noises that caused by the shadows are detected using this method (NCC). The structure of this paper is as follows. In the next section, previous research works related to this paper is discussed, followed by the description of our proposed approach. The experimental setup and results are then presented, before finally the conclusion to this paper is drawn.
moving object based on temporal differencing, ghost removal and shadow removal using NCC, while using a non-static PanTilt-Zoom (PTZ) camera. To detect moving object in current image, the previous frame, , is compensated with respect to the current image. The subtraction of pixel intensity is done between the two aligned images. The used of temporal differencing resulting an appearance of ‘ghost’. This paper proposes a technique to remove it by using the other previous image, . The resultant output is then cleaned up using morphological opening operator, before shadow removal is done. Each pre-defined foreground pixels are verified whether it is shadow pixel or foreground pixel. The results show the moving objects are extracted without shadow or other noises.
Keywords-PTZ camera; object extraction; temporal differencing; shadow removal. I.
INTRODUCTION
Extraction of moving object is an important step in the system of video surveillance, traffic monitoring, human tracking and other applications. The PTZ camera has pan, tilt and zoom control and it rotates 360 degrees on its axis. Since the scene captured by the PTZ scene may be different between each frame, camera motion estimation is needed. While using a static camera, the input image is always aligned to the reference image. Three common methods of motion segmentation are background subtraction, temporal differencing and optical flow. Background subtraction is the most popular method, especially for situation which has static background. It detects moving pixel by taking the difference in pixel intensity between current image (input image) and background image. It is an effective and simple method only if the background image is 100% aligned to the current image [1]. The concept of temporal differencing is similar to the background subtraction, which is taking difference of intensity between two images as a reference [2]. But in temporal differencing, it use two or three consecutive frames in an image sequence to extract moving objects. But it has less effectiveness if the object moves slowly since it used previous frame as a reference image.
2166-0662/14 $31.00 © 2014 IEEE DOI 10.1109/ISMS.2014.154
229
II.
III.
RELATED WORKS
Various techniques exist in literature for moving object extraction. In [8], they use method based on temporal differencing method since there is a problem while using background subtraction. They propose a reliable foreground segmentation algorithm that combines temporal image analysis with a reference background image. The method is adapts to the background changes in illumination. All the pixels in the image, even those covered by foreground objects, are continuously updated in the background model. In [9], they develop an adaptive segmentation algorithm for colour video surveillance sequence in real time with nonstatic background. A background is modelled using multiple correlation co-efficient using pixel-level based approach. At runtime, segmentation is performed by checking colour intensity values at corresponding pixels P(x,y) in three frames using temporal differencing frame gap three). The segmentation starts from a seed in the form of 3×3 image blocks to avoid the noise. Usually, temporal differencing generates holes in motion objects. After subtraction, holes are filled using image fusion, which uses spatial clustering as criteria to link motion objects. The emphasis of this approach is on the robust detection of moving objects even under noise or environmental changes. Local Histogram processing is sometimes used in edge extraction, as in [10], it is applied for small non-overlapping blocks of the output of the first derivative of a narrow 2D Gaussian filter. The method is starts by convolving the image with a narrow 2D Gaussian smoothing filter to minimise the edge displacement, and increase the resolution and effectiveness of detection. Processing of the local histogram of small nonoverlapping blocks of the edge map is carried out to perform an additional noise rejection operation and automatically determine the local thresholds. Problem of adherent noises such as water drops or mud blobs on the protecting glass surface lens disturbs the view from the camera. In [11], the regions of adherent noises in the reference image are identified by examining the shapes and distance of regions existing in the subtracted image (produced by subtracting pixel intensity between current image and reference image). They merge the two images to eliminate the region of noises. In [12], they used a method based on the use of invariant colour models to identify and to classify shadows in digital images. Image areas which are darker than their surroundings are first identified as shadow regions. Then, by using the invariant colour features, the shadow candidate pixels are classified as self shadow points or as cast shadow points. The use of invariant colour features allows a low complexity of the classification stage.
PROPOSED APPROACH
For any given current image frame, ft of dimension m by n number of pixels, a motion map of the same dimension will be generated, assuming that the camera intrinsic parameter is not known. Motion map is binary map which each of the pixels indicates whether its corresponding pixel in current frame is belong to the background or moving foreground. From this motion map, bounding box enclosing each of the connected motion pixels in the map can be computed to indicate the location of the moving object in current frame. Our proposed method of moving object extraction using PTZ camera is an integration of temporal differencing, histogram-based foreground/background classification and shadow detection. For this work, moving object extraction starts at frame number, , since the first image at has no reference image. The process of extracting moving object is started with estimating camera motion between most previous image, and current image, , frames. Then the computed transformation matrix will be used to compensate the previous to align it to current image frame. Then both images will be processed to finally extract the moving object. The overall proposed system is illustrated in Fig. 1. Details of each step are as follows.
Input Image ( )
Input Image ( )
Camera Motion Estimation
Camera Motion Compensation
Temporal Differencing
Ghost Detection and Removal
Shadow Detection and Removal
Output : Motion map, M
Figure 1. Overall proposed system
Next section explains technically about our proposed method, which will combine the method of temporal differencing, ghost removal and shadow removal.
A. Camera Motion Estimation In this step, camera motion between previous and current image will be computed. The output is a homography matrix which defines the transformation between both images. Steps to estimate camera motion is 230
illustrated in Fig. 2. First, keypoints in both previous and current frame are extracted and their descriptors are computed. In this work Speeded-Up Robust Features (SURF) descriptor [15] is used as they are providing more robust against different image transformations than Scale Invariant Feature Transform (SIFT) and is several times faster than SIFT [13]. Next the extracted keypoints in previous image will be matched with keypoints in current image. From the matched keypoints, homography, can be computed. Input 1 : Previous Image,
(c) Figure 3. Sample of compensated previous image; (a) current image; (b) previous image; (c) compensated previous image.
C. Temporal differencing between current frame and compensated previous frame The output from this stage is the motion map indicating pixels belong to moving object and background in current frame. The process flow for this step is illustrated in Fig. 4. First, the compensated previous frame is compared to the current frame by subtracting pixels intensity values. The difference is compared to a threshold value. If the difference is more than the threshold, the pixel is assumed to be a motion pixel and otherwise. The output contains noises, and one of the reasons is due to the motion estimation error. Noise is the background pixel which is falsely identified as motion pixel. The higher the error, the output will contain more noise. To remove small and separated noises from the actual motion area and filling small holes within motion area, morphological closing is applied [16]. Closing operator can be implemented using dilation and erosion techniques. Dilation is a process of adding pixels to the edge of motion area. Motion area becomes thicker and small holes within motion area are filled. The process is followed by erosion, which removes pixels from the edge of motion area. By applying morphology operators, small and separated noises can be removed, however the connected noise pixel is not possible to be removed by using this process. One of the reasons for connected noise is due to the existence of object shadow in the image. One method for detecting and removing noise pixels due to shadow will be presented in later section.
Input 2 : Current Image,
Extract SURF Keypoints
Extract SURF Keypoints
Compute SURF Descriptors
Compute SURF Descriptors
Find Keypoint Matching
Compute Homography
Output : Homography,
Figure 2. Steps in estimating camera motion
B. Camera Motion Compensation The output of this step is the compensated previous image with respect to current image frame. The process is done by applying Ht to the previous image as in Eq. (1). (1) Fig. 3 shows a sample of previous compensated image. Where is the compensated or registered previous image, is the previous image and is the homography between previous and current image.
(a)
D. Ghost Removal One of the problems with frame differencing is the presence of ghost effect. Ghost effect is referring to the ghost motion blob that may appear when doing pixel subtraction between current and previous image if there are moving objects in previous frame. In this paper, we detect the presence of ghost blob by performing frame differencing between current frame and consecutive previous frames and fuse the consecutive motion maps to determine the ghost blob in current frame and finally remove the ghost pixels. The programming flow chart is shown in Fig. 4. Inputs to this process are the previous frame, and motion map 1 obtained from previous step of frame differencing. Initially, camera motion is estimated between current frame, and previous frame, . Then is registered with respect to current frame; compensated frame is denoted
(b)
231
result of pixel subtraction between two registered previous images, and Fig. 5(g) shows the final motion map.
as . Next, frame differencing between registered previous frames, and are performed, and motion map 2 is obtained. Then to detect ghost pixels, all motion pixels in motion map 1 is compared against its corresponding pixels in motion map 2. If its corresponding pixel in motion map 2 is also a motion pixel, then the pixel is deemed as ghost pixel. In generating motion map, all motion pixels identified as ghost pixel, will be re-classified as background pixels.
(a)
(b)
(c)
(d)
(e)
(f)
Start
Capture image
Store image,
in array
Yes
? No (g) Figure 5. The frame differencing between; (a) compensated previous image, , and; (b) current image, , (c) Motion Map 1; (d) compensated previous image, , and; (e) compensated previous image, , (f) Motion map 2; (g) Final motion map by combining the analysis on motion map 1 and motion map 2.
Compensate & with respect to
12-
Subtraction between; & = MotionMap1 & = MotionMap2
E. Shadow Detection Some connected noises, as illustrated in Fig. 6(b), are caused by shadows. Shadows are formed when the light is blocked by object. Not all shadows are in total black since there are some factors influence the intensity of shadows pixel [17], such as the opacity of the moving objects. In this paper, we assume that the intensity of a shadow pixel is directly proportional to incident light, then the colour of shadows are scaled version (darker) of corresponding pixels in reference (previous) image [18].
For every pixel in MotionMap1 ( ); No MotionMap1 ( ) = white?
No
MotionMap2 ( ) = white?
Connected noise Separated
MotionMap1 = black (removing ghost)
noise (a)
(b)
Figure 6. Motion map from frame differencing step; (a) current image; (b) resultant motion map. Black pixels are background, whereas white pixels are motion pixels, including noises.
End
Normalized cross correlation (NCC) is useful to detect shadow region. It has been commonly used to evaluate the degree of similarity between two images to be compared. The main advantage of the normalized cross correlation over the cross correlation is that it is less sensitive to linear changes in the amplitude of illumination in the two
Figure 4. Programming flow chart of ghost removal
Fig. 5(c) shows a subtraction between current and previous image that resulting a ghost blob. Fig. 5(f) is the
232
compared images. This technique has been used in [18]. The setting of detection threshold value is much easier than using cross correlation. For the case of a shadow pixel, it is assumed that the colour properties of a shadow pixel in current image is similar to the corresponding pixels in the reference image, however, the intensity is slightly darker in current image as compared to the reference image. Thus, NCC is calculated here to estimate how similar the colour properties of each motion pixel in current and previous frame, and finally re-classified them as either shadow pixel or not. The NCC of a motion pixel at coordinate ( ) can be calculated as in Eq. (2). P( ) is a pixel intensity at coordinate in previous frame, is a neighbourhood of pixel in current image within a window of size , where – and – . NCC’s computation time increases dramatically as the window size gets larger [7]. In this work, we fix the window size, N as 4.
Figure 7. Result of temporal differencing and ghost removal. Top from left; compensated previous image, , current image, , Motion Map 1. Middle from left; compensated previous image, , compensated previous image, , Motion map 2. Bottom from left; Final motion map
(2) Where, (3) (4) (5) A motion pixel (i,j) is re-classified as shadow pixel if; and
(6)
If the threshold, TNCC, is too low, some motion pixels may be misclassified as shadows. But, having a large TNCC, some actual shadows may not be detected. So, selecting a TNCC is very important so that we can remove the actual shadows. The second condition in Eq. (6) highlights that for a shadow pixel, the energy in previous image is higher than in current image or in other words, a shadow pixel is appears darker in current image as compared to previous image. IV.
Figure 8. Result of temporal differencing and ghost removal. Top from left; compensated previous image, , current image, , Motion Map 1. Middle from left; compensated previous image, , compensated previous image, , Motion map 2. Bottom from left; Final motion map
Fig. 9 shows results of shadow removal for each frame of dataset ‘shadow’. Shadow is indicated as red color. This proposed algorithm is successfully eliminates unwanted objects, like shadow.
EXPERIMENTAL RESULTS
Fig. 7 and 8 show a result of object extraction between and , and between and . The bottom picture shows a motion map after removing ghost. The ghost (came from previous frame has been eliminated).
233
Future work for this project is to further improve the shadow detection with fine shape of moving objects. REFERENCES [1] [2]
[3]
[4] [5]
[6]
[7]
(a) (b) Figure 9. Results of proposed method; (a) current frame; (b) motion map, white is the moving object pixels, black is the background pixel and red is the motion pixel re-classified as shadow pixel.
V.
[8]
CONCLUSION
[9]
This paper proposed a method for extracting moving object using temporal differencing, ghost removal and shadow removal while using a non-static PTZ camera. The proposed approach begins with finding the difference between current and compensated previous images. The output is a motion map that indicates whether a pixel is belongs to the background or moving object. However, by performing frame differencing between successive frames, motion map may suffer the ghost effect. We propose to subtract another previous image at with previous image at . The comparison between these two motion maps will eliminate the ghost. The result had been further refined using morphological operator. Final step is to remove noises that caused by shadows, using the normalize cross-correlation. The technique of removing shadow is based on the research work in [18]. The contributions of this paper are this method can be used in real time with high computation speed and its excellent performance of detecting moving object in every frame. On the other hand, the efficiency of temporal difference as compared to background subtraction, in which background subtraction need to have a lot of background image since we are using PTZ camera, and it is more suitable when using static camera. Using the proposed method, the system can be just started with capturing current frame and it can be used by PTZ camera or hand held camera. There are many applications which use this system, such as surveillance system in housing area, people tracking and road traffic.
[10]
[11]
[12]
[13]
McKenna, Stephen J., et al. "Tracking groups of people." Computer Vision and Image Understanding 80.1 (2000): 42-56. S. Vahora, C. Narendea and P. Nilesh, "A Robuts Method for Moving Object Detection Using Modified Statistical Mean Method," International Journal of Advanced Information Technology (IJAIT), vol. 2, no. 1, p. 65, 2012. D. P. Bertsekas, A. Nedich and V. S. Borkar, "Improved Temporal Difference Methods with Linear Function Approximation," in Learning and Approximate Dynamic Programming, Wiley-IEEE Press, 2004, pp. 231-235. J. Barron, D. J. Fleet and S. S. Beauchemin, "Performance of Optical Flow Techniques," in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, ICPR, 1994. S. Khodambashi, “An impulse noise fading technique based on local histogram processing”, in IEEE International Symposium on Signal Processing and Information Technology, ISSPIT, Ajman, December 14-17, 2009, IEEE, pp. 95-100. S. Solehah, S. N. Yaakob, Z. Kadim and H. H. Woon, “Moving Object Extraction in PTZ using an Integration of Background Subtraction and Local Histogram Processing” in IEEE Symposium on Computer Applications and Industrial Electronics, ISCAIE, 2012, Kota Kinabalu, Malaysia, December 3-4, 2012, pp. 167-172. D. M. Tsai and C. T. Lin, “Fast Normalized Cross Correlation for Defect Detection,” Journal of Patter Recognition Letters, vol. 24, no. 15, p. 2625-2631, November 2003. P. Spagnolo, T. D. Orazio, M. Leo and A. Distante, “Moving Object Segmentation by background subtraction and temporal analysis,” Image and Vision Computing, vol. 24, no. 5, pp. 411-423, May 2006. S. Murali and R. Girisha, “Segmentation of Motion Objects from Surveillance Video Sequences using Temporal Differencing Combined with Multiple Correlation”, Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 09, 2009, pp. 473-477 M. Khalil, “Edge Detection Using Adaptive Local Histogram Analysi” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2006, Toulouse, May 14-19, 2006, vol. 2, pp. II. A, Yamashita, T. Harada, T. Kaneko and K. T. Miura, “Removal of adherent noises from images of dynamic scenes by using a pan-tilt camera,” in Proceedings of International Conference on Intelligent Robots and System, IROS 2004, vol. 1, pp 437-442. S. Elena, C. Andrea, E. Touradj, “Shadow Identification and Classification Using Invariant Color Models,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2001, May 7-11, 2001, Salt Lake City, UT, vol. 3, pp. 1545-1548. L. Juan and O. Gwun, “A Comparison of SIFT, PCA-SIFT and SURF”, International Journal of Image Processing, IJIP, 2009, vol. 3. No. 4, p. 143.
[14] http://opencv.willowgarage.com/documentation/cpp/feature_detectio n.html , [Accessed on Jan 2012]. [15] H. Bay, A. Ess, T. Tuytelaars, L. V. Gool, SURF: Speeded Up Robust Features, Computer Vision and Image Understanding (CVIU), vol. 110, No. 3, pp. 346--359, 2008 [16] Rafael C. Gonzalez, Richard E. Woods, Digital Image Processing , 3rd ed. 2010, pp. 649-686. [17] http://www.exploratorium.edu/snacks/colored_shadows/ [Accessed on May 2012]. [18] Jacques, J.C.S.; Jung, C.R.; Musse, S.R., “Background Subtraction and Shadow Detection in Grayscale Video Sequences”, 18th Brazilian Symposium on Computer Graphics and Image Processing, SIPGRAPI 2005, Oct 9-12, 2005, pp. 189-196.
234