with Histogram. Muhammad Nawaz, Prof John Cosmas, Awais Adnan, Muhammad Inam Ul Haq, Eman Alazawi. AbstractâIn the background subtraction ...
Foreground Detection using Background Subtraction with Histogram
Index Terms— Background Subtraction, Foreground Detection, Histogram
I. INTRODUCTION: Motion plays a very important role in any foreground detection algorithm. The level of preciseness of motion detection directly affects the efficiency of performance measures and subjective quality of detected foreground. Motion detection approaches vary for static and dynamic backgrounds. The algorithm presented in this paper is ideal for static background. Detecting any change in pixel will be considered as a part of the foreground object. Based on temporal information between frames implied in the difference, two different approaches are recommended: background (-frame) subtraction and techniques based on temporally adjacent frames. Background (-frame) subtraction uses the first frame as the reference image/frame and if there is significant change (in pixels) in the subsequent frames, then it shall be considered as part of the foreground object. This is a very simple and useful solution and can be used in the following two situations: Ideal situation: where there is no foreground object in the reference image, so the foreground result is good enough as shown in the figure (1). However in real world/general situation there is a possibility that there is foreground object in the reference image. When the current image appears with a new foreground object, the new and old foreground objects both are detected i.e., the foreground object in the reference image is also known as ghosting.
Background subtraction result
Current frame
Ideal
Reference frame
White mask is that of current frame
General
Abstract—In the background subtraction method one of the core issue is; how to setup the threshold value precisely at run time, which can ultimately overcome several bugs of this approach in the foreground detection. In the proposed algorithm the key feature of any foreground detection algorithm; motion is used however getting the threshold value from the original motion histogram is not possible, so for the said purpose smooth motion histogram is used in a systematic way to obtain the threshold value. In the proposed algorithm the main focus is to get a better estimation of threshold so that to get a dynamic value, from histogram at run time. If the proposed algorithm is used intelligently in terms of motion magnitude and motion direction it can distinguish accurately between background and foreground, camera motion along with camera motion and object motion.
Event
Muhammad Nawaz, Prof John Cosmas, Awais Adnan, Muhammad Inam Ul Haq, Eman Alazawi
Yellow and blue foreground masks are of current and refrence images respectively
Figure 1 Background Subtraction Time-differencing or technique based on temporally adjacent frames suggests that a pixel is moving if and only if its intensity has significantly changed between previous and current frame. In Equation 1, is the pixel position intensity that belongs to a moving object, represent current frame at is the previous frame at time t-1, and is the time t, threshold value. |
|
1
There is no doubt that this is an easy approach but it will only work if the object speed and frame rate is known in advance. Otherwise this technique leads to two types of problems [1]: foreground aperture and ghosting. The solution of this problem is known as double-difference image [2], the pictorial form of this algorithm is given in the Figure (2).
Algorithm 1 |
|
AND |
Foreground Pixel; | | AND | Collect pixels in blobs; #
In order to avoid the need of previous background learning, robust pixel foreground classification is introduced [4]. This algorithm claims that robust pixel foreground classification is possible without the need of previous background learning. For proper pixel classification joint background subtraction and frame-by-frame differencing method is used as given in the Algorithm 1 and this is known as joint difference algorithm [4]. Finally, the background model is selectively updated using Equation (2) as suggested in [5]. In Equation is (2), the value of depends on pixel classification, background at time t 1 and is the foreground at time t. 1
2
|
# Foreground Pixel; // foreground aperture problem solution else Background pixel; // background object suddenly starts moving at time t end if | | | AND |
Figure 2: Double-Difference Image Algorithm This approach requires thresholding difference between frame at time t-1 and t, and between frames at t and t+1. Finally they are combined by logical AND operator. However, this approach has the drawback that it cannot find the precise position of an object in real time. Similarly, accurate motion detection becomes a problem, if there is not enough texture. Another algorithm that has been proposed is known as hybrid algorithm [1] for motion detection. This algorithm is based on three-frame differencing methods, which finds the difference between frames at time t and t 1 and the difference at t and t 2 in order to determine the regions of reasonable motion and overcome the ghosting problem. Adaptive background (frame) subtraction [3] was applied to overcome the issue of the foreground aperture. This background update procedure has three main problems [7]: when the foreground object begins or ends motion, in case of luminance variation, and produces problem at variable depth shorts as this is mainly used in outdoor atmosphere shooting with low depth of field images.
|
Background
Pixel;
//ghosting
problem
solution else // |
|
AND |
|
Background Pixel; end if The good enough performance of this algorithm has been confirmed in [7]. However, this is at the cost of two disadvantages: this method totally fails when the foreground object stops or if its speed is low then the second drawback is that of setting the proper threshold value [7-8]. This problem can be solved using: clustering, entropy and object attribute based methods. Other two methods are spatial and local methods. The proposed algorithm provide the solution to this problem using Histogram shaped-based method, by using histogram peak, valleys, and curvatures of the smooth histogram. The reasons for selecting Histogram shaped-based method are: higher accuracy and less time of execution. The challenge of original/non normalized histogram is depicted in figure 3, which shows the original histogram of the reference image and current frame which both have so many peaks that decision making using thresholding is not so possible.
Frames
Respective non normalized Histogram
Reference Image
200
Number of Pixels / Pixel Frequency
150
100
50
Current Frame
0
50
100
150 Pixel Value / Pixel Intensity
200
250
motion in that range (total bins are 32) for decision making of threshold value as shown in the figure 4. 8. Threshold value In the proposed algorithm, x-axis of the histogram is divided in two equal portion i.e., first half is from 0 to 16 and the second is above than 16 to 32 in order to cover the major portion of motion in both halves. This can be expressed mathematically in the equation 3 and 4.
160
140
Number of Pixels / Pixel Frequency
120
100
1
0
2
16
16
3
80
60
40
20
0
50
100
150 Pixel Value / Pixel Intensity
200
32
4
250
Figure 3 Original Histogram
II.
PROPOSED ALGORITHM AND EXPERIMENTAL RESULTS
1. 2.
Video is analysed frame by frame Macro block/vector size is set as 4x4 =16; the reason is that by taking lesser value than that increase the execution time while larger size decreases the accuracy of motion detection. 3. Search distance value is set as 5. It sets the sensitivity and possible values of motion, for example if it is 2 maximum motion will be around 2 and if it is 10, motion vectors will be calculated in 10 values. During testing different values were tried such as 5, 10 and 15 but results were almost the same while 5 is faster in terms of execution time, than the rest of two. Furthermore search size 5 was also found suitable for 32 bins. 4. Motion vectors are calculated in terms of magnitude and direction, however due to the scope of the topic only magnitude is considered in this paper. 5. Get motion information of previous 5 frames The reason for taking window size of 5 is that, if large windows size is used, it will ignore short motion (normally there are 12 to 25 frames per second and by taking window size of 5 measures motion of 0.2 to 0.5 seconds, so larger window will not be suitable. On the other hand if window size is small, it will not be beneficial to take average of 2 or 3 frames. That is why mid value is the appropriate and normally in different compression methods, 9 frames are used as a set so windows size 5 suits that number 9 and 5 is also at the middle of digit 9. That is why to get sufficient motion information; the average on previous 5 frames was used, while on the other hand if only one previous frame is used this does not has enough information to be utilised effectively. 6. Motion vector is converted into 32 values for information/values/data on x-axis of the histogram (i.e., 0 to 32), this represent the motion vectors magnitude. For the decision making smooth values of the histogram are considered as show in the figure 4. 7. The overall histogram of proposed algorithm is in such a way that x-axis has motion vectors magnitude (from 0 to 32) and y-axis has the number of blocks (4x4) having
Figure 4 Histogram From histogram two largest peaks are selected and valley is calculated which is in between these two peaks, subject to the condition that if first peak is in first half and lies in first bin, where motion is zero, then it will be considered as background and else part if any represents foreground as can be seen the figure 5, which is the first frame. Worth to mention that at this point camera is still and this point shows background part of the frame. Similarly if camera is in motion first peak will not be in first bin, as first bin shows that most of the blocks have zero motion which is not possible if camera is moving as can be seen in the Figure 4.
Figure 5 Histogram of 1st frame
6 Figure 6, shows the foreground results on the proposed algorithm using various video sequences along with the respective original frames, while black mask shows background which is static at that particular instant of time.
Frame No.892
Frame No.985
Frame No. 1
Frame No.3
Frame No.7
Frame No.67
Frame No.126
Frame No.183
Frame No.207
Frame No.213
Frame No.214
Frame No.229
Frame No.232
Frame No.339
Frame No.354
Frame No.359
Frame No.393
Frame No.406
Foreground
Original
Frame No.39
Foreground
Original
Frame
Foreground
Original
Frame
Frame
Original
5
Frame No. 20
Foreground
The reason for selection of largest peaks from both half’s is to get better valley and cover the maxim motion area. Furthermore during the searching for the second peak in the range from 255 to 1 on y-axis, this peak represents another major movement. Here instead of considering only magnitude of the motion, number of motion blocks are also consider as this give better result, then search a valley (minimum between the two largest peaks) which is the ultimate threshold value. Less value than threshold is backgrounds and greater is foreground as given in the equation 5 and 6. The reason for picking valley as threshold point, although it points to a minimum value between two peaks so first of all if it is minimum value (may be zero) means it is not representing any cluster. And it is known that on both sides there are two peaks so it means there are two major motions (as peaks means many blocks having these motion) segments on both sides so this is the best option for threshold as it best estimate between two major values. Valley point having minimum value, is a point dividing the feature vector into halves have at-least one peak on each side so by taking this as threshold and values less than or equal to threshold belongs to background part while values above than this belong to foreground part of the respective frame as expressed in the equation 5 and 6.
Frame
Frame
Original
Refereeing to figure 3, for peak selection first of get the maximum number (i.e., largest peak) and then select second largest peak in such a way that these two peaks fall in separate portion (one peak is in first half and second peak in second half or vice versa for better motion estimation) then find minimum value between the two to get valley. This is the threshold point of proposed algorithm.
Foreground
ACKNOWLEDGMENT
Frame No.519
Frame No.523
Frame No.528
Frame No.535
Original
Frame
We are grateful to the Institute of Management Sciences Peshawar (http://imsciences.edu.pk/) through Higher Education Commission Islamabad, Pakistan (http://hec.gov.pk/) who financially supported this project. I am also thankful to Abdulkadir Audu, PhD scholar who helped me in conducting this research. REFERENCE: Collins, R., Lipton, A., Kanade, T., Fijiyoshi, H., Duggins, D., Tsin, Y., Tolliver, D., Enomoto, N., Hasegawa, O., Burt, P.,Wixson, L.: A system for video surveillance and monitoring. Tech. rep., Carnegie Mellon University, Pittsburg, PA (2000)
2.
Kameda, Y., Minoh, M.: A human motion estimation method using 3-successive video frames. In: International Conference on Virtual Systems and Multimedia (VSMM), pp. 135–140. Gifu, Japan (1996)
3.
Kanade, T., Collins, R., Lipton, A., Burt, P.,Wixson, L.: Advances in cooperative multi-sensor video surveillance. In: Darpa Image UnderstandingWorkshop, vol. I, pp. 3– 24. Morgan Kaufmann (1998)
4.
Migliore, D., Matteucci, M., Naccari, M.: A revaluation of frame difference in fast and robust motion detection. In: 4th ACM International Workshop on Video Surveillance and Sensor Networks (VSSN), pp. 215–218. Santa Barbara, California (2006)
5.
Wren, C., Azarbeyejani, A., Darrell, T., Pentland, A.: Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis andMachine Intelligence (PAMI) 19(7), 780–785 (1997) K. K. Hati, P. K. Sa and B. Majhi, "LOBS: Local Background Subtracter for Video Surveillance," 2012.
Foreground
1.
Frame No.625
Frame No.631
Frame No.640
Frame No.668
Foreground
Original
Frame
Frame No.738
Frame No.798
Frame No.899
Frame No.902
Foreground
Original
Fram e
6. Figure 6 Foreground results III.
CONCLUSION
In this paper, we presented a simple technique to obtain the threshold value for background subtraction method from smooth motion histogram which precisely separate foreground (motion area) and background (static). The proposed algorithm is tested on eight different video sequences of various natures and produced good enough results visually. From experimental results it is cleared that the proposed algorithm work very well where camera and background are fixed and only foreground object(s) is/are in motion, however when camera is moving or foreground object(s) motion is very slow the proposed algorithm may produce erroneous results.
7. E. MARTNEZ-MARTN, "Robust motion detection in real-life scenarios," 2012. 8. S. W. Sun, Y. C. F. Wang, F. Huang and H. Y. M. Liao, "Moving Foreground Object Detection via Robust SIFT Trajectories," Journal of Visual Communication and Image Representation, 2012.