Unusual Event Detection in Low Resolution Video for ...

0 downloads 0 Views 494KB Size Report
Keywords—Object Tracking, video surveillance, Unusual event detection, background subtraction, ATM security. I. INTRODUCTION. In the past few decades, ...
Unusual Event Detection in Low Resolution Video for enhancing ATM security Sudhir Goswami1, Jyoti Goswami2, Nagresh Kumar3 1, 3

Computer Science Department MIET, Meerut, 2 Electronics Department NITTTR, Chandigarh,

[email protected],[email protected], [email protected] Abstract— In real world applications, tracking target in low resolution video is a challenging task because there is loss of discriminative detail in the visual appearance of moving object. The existing methods are mostly based on the enhancement of LR (low resolution) video by super resolution techniques. But these methods require high computational cost. This cost further increases if we are dealing with events detection. In this paper we present an algorithm which is able to detect unusual events without such type of conversion and well suited for enhancement of security of ATMs where conventional low resolution cameras are generally used due to their low cost. Proposed algorithm only uses close morphological operation with disk like structuring element in the preprocessing steps to cope up with low resolution video. It further uses rolling average background subtraction technique to detect foreground object from dynamic background in a scene. Our proposed algorithm is able to recognize the occurrence of uncommon events such as overcrowding or fight in the low resolution video simply by using statistical property, standard deviation of moving objects. It is fast enough because it process low resolution frames and could be helpful in surveillance system for enhancing the security of ATMs where conventional camera of low resolution are still used. It does not use any classifier and avoids the requirement of training the system initially. Keywords—Object Tracking, video surveillance, Unusual event detection, background subtraction, ATM security.

I. INTRODUCTION In the past few decades, significant efforts in the field of moving object detection and tracking have been done to make following applications reliable, robust and efficient: video surveillance, robotics, authentication system, media production, biological research etc. as in [1]. But there are many challenges which produce hurdles in the improvement of these applications. These challenges may include illumination change, dynamic background, camouflage, occlusion, shadow etc. as in [2]. These obstacles become more cumbersome when we perform object tracking in low resolution video. In low resolution video it is very difficult to accurately find out the object of interest because most of the discriminative details such as visual features and primitives have been lost. It results in inaccurate object tracking which further lead to inefficient event detection. But there are certain benefits of using low resolution video such as it requires low storage, transmission time and processing time as in [3]. Most of the conventional tracking approaches are based on the high resolution (HR) video to extract exact contour as in [4] and shape as in [5] features of target. But these approaches require more computational cost because they work on high resolution frames. Some approaches in the literature use low

resolution videos as an input but afterward these videos are enhanced to high resolution with the help of any super resolution techniques, which proves to be not cost effective. In the literature of abnormal event detection, most of the methods as in [6], [7] and [8] uses classifiers to recognize the events and does not use low resolution video input. These classifiers require learning time and careful attention on training dataset. Some approaches as in [9] require manual setup initially in the automated event detection system and have high computational cost. From the literature we come to the fact that we need an algorithm which deals with uncommon event detection in low resolution video to assist a fully automated surveillance system. This paper presents an algorithm which is able to detect unusual event in low resolution video. Typical application of our proposed approach is to enhance the security of ATM without removing conventional low resolution camera. It uses rolling average background subtraction technique to segment foreground object from scene with dynamic background and preserves object features to an extent by simply applying morphological operations with the suitable structuring element. It also does not need any classifier and training dataset. It only uses statistical property standard deviation of the centroids of the blobs to recognize the occurrence of the abnormal events. The organization of this paper is as follow: Section 2 briefly introduces state-of-the-art in video surveillance system. Section 3 discusses the proposed approach for detecting the unusual event. Experiments and the results are described in section 4. Conclusion and future work is given in section 5. II. STATE-OF-THE-ART in VIDEO SURVEILLANCE SYSTEM In general, any object tracking system includes four main building blocks to automate the surveillance system as in [10]:

   

Moving object detection Object tracking Event recognition Object identification

A. Moving Object Detection The change detection in image sequence is gaining popularity due to large number of application in several disciplines. Video surveillance is one of the important applications among them that detect changes in the scene. There are several schemes that are used to detect such

changes. These approaches conventional classes:

are

categorized



Temporal differencing



Background modeling and subtraction

in

two

First method is the simplest one and has low computational cost but performance is quite poor in the real life surveillance applications. While the second approach is proven to be successful in several surveillance algorithms, it uses the dynamic or static background for effective detection of the foreground object. Figure 1, shows an overview of the background modeling and subtraction system as in [1]. Video Acquisition

Frame Conversion

Pre Processing Background Modeling

Background Subtraction Post Processing Foreground Extraction Figure 1. Overview of background subtraction system

1) Video Acquisition: This step deals with acquiring the video by any one of the video capturing device such as Handycam, Mobile camera, USB camera, CCTV camera etc. 2) Frame conversion: After capturing the video, it is converted into frames of suitable type so that further processing could be done conveniently. 3) Preprocessing: Some pre-processing is applied on the frames of the video to reduce noise. There are some common methods of preprocessing as in [11]: Smooth, Dilate, Erode, Median, Open, Close etc. 4) Background Modeling: After preprocessing background modeling is used to create an ideal background (static or dynamic) according to environmental changes. This is an important step of the system that sometime may include image subtraction operations. It is the defining characteristics of any background subtraction system. According to the literature there are several background modeling techniques which are categorized as recursive or nonrecursive techniques as in [12].

5) Background Subtraction: This is the main step of the background subtraction system. In this step any significant changes in the image region from background model are identified & then pixels constituting the regions undergoing change are marked for further processing. Usually connected component labeling algorithm is applied to obtain connected regions corresponding to the object. 6) Post processing: Finally, post processing is done to improve the results. There are many post processing techniques that can be used after background modeling and subtraction [4]. These techniques have an objective to improve foreground mask. 7) Foreground extraction: This is the final step in the process which extracts the moving object from the frame [9]. The result of this step helps in the judgment of the efficiency of the background subtraction system. B. Object Tracking Obtaining the correct tracking information of moving foreground object is not easy task in events like modeling and activity recognition as in [14], [15], and [16]. For this purpose many different type of algorithm have been used. Most of these algorithms are divided into four different groups as in [24]: Model based, Region based, Contour based, Feature based. C. Event Recognition Event recognition is the ultimate purpose of a fully automated surveillance system. It is not easy to define the type of motion that is meaningful in surveillance context. There are many studies that address different type of events as in [17]. In event recognition objects are detected by using background subtraction and then their boundaries are extracted to produce a skeleton. This skeleton provides important motion cues, such as body posture etc. Motion activities of segmented skeletons/blobs can be utilized in event detection and recognition, such as walking or running, fight or theft, overcrowding etc. D. Object Identification Understanding the identification of moving object entering the scene is another important part of a surveillance system. Latest studies focus on person identification and are based on biometrics such as Face and gait as in [15]. III. PROPOSED APPROACH Our proposed technique is used only for detecting unusual event such as fight and overcrowding for the low resolution video particularly used in the ATM. For the sake of clarity and better understanding of algorithm this section is segregated into the following sub-sections:  Rolling Background Subtraction Technique  Close Morphological Operation (o)  Thresholding and Standard Deviation ( )  Pseudo Code of Proposed Approach

A. Rolling Background Subtraction Technique We used the rolling average background subtraction technique that dynamically update the background model and does temporal differencing. The equation used for this technique as in [18] is: Bt =  ut + (1 - ) Bt-1

(1)

Here  [0, 1] is a learning rate parameter that controls how quickly the background model incorporates new information and how quickly it forgets older one. B. Close Morphological Operation To fill small gaps inside the moving object, we use closing morphological operation. It is also helpful in reducing noise which remains in the moving object. Closing operation is composed of two sub-operations: dilation followed by erosion as in [19]. In dilation, those pixels which are touching the object pixels are changed to object pixels. Dilation adds pixels to the boundary of the object and closes isolated background pixel. While in erosion, each object pixel that is touching a background pixel is changed into a background pixel. Erosion removes isolated foreground pixels.

Standard deviation (σ) shows how much dispersion exists from the average value as in [23]. A low standard deviation indicates that the data points tend to be very close to the mean and high standard deviation indicates that the data points are spread out over a large range of values. The standard deviation is calculated by following formula: (4) Here N is the number of samples in population, μ is the mean value, xi is the sample value and σ is the standard deviation. A useful property of standard deviation is that, unlike variance, it is expressed in the same units as the data. In our proposed scheme we have calculated the standard deviation of the population of centroids of the bounding box of the blobs in „n‟ consecutive frames. If this standard deviation is above than a threshold value continuously „x‟ time then it signifies the presence of unusual event.

Here „o‟ is the morphological closing operation and „D‟ is disk like structuring element of radius „‟. There are many structuring elements whose shape and size choice depends on the type of information one wants to retrieve as in [20].

D. Pseudo Code of Proposed Approach In our proposed scheme, we first segmented the moving object from the frame using rolling average background subtraction technique. Then we refine the result by closing morphological operation followed by connected component labeling algorithm. Connected component algorithm joins the different parts of same blobs either by using 4-connectivity or 8-connectivity. Area of these blobs are calculated and then filtered out by the threshold value to avoid further processing of useless frames. In these useless frames little movement of blob occurs, such situations could be a person near ATM machine, where only hands movements occur. After this we calculated the standard deviation of blobs in these selected frames and checked them out with another threshold value for the detection of unusual event. This algorithm is robust enough to deal with the situation in which rapid movement of single person occurs in the ATM such as picking of fallen ATM card from the floor etc.

C. Thresholding and Standard Deviation Thresholding is the simplest method of image segmentation as in [21]. Every pixel in each frame is classified as either background (0) or foreground (1) using a simple thresholding function:

The algorithmic steps of proposed scheme are given below: Here  represents the standard deviation, T1 and T2 represents threshold values used for segmented_area and standard deviation() respectively. segmented_area, num_blobs, n, and x all represent numeric integer values.

The property of morphological operation to remove noise and filling small gaps makes it well suited to our objective since we are interested in generating masks which preserve the object boundary. This preserved boundary is helpful in efficient implementation of connected component algorithm, which is used further in the calculation of the blob‟s area. Composite form of dilation followed by erosion (Closing) morphological operation can be expressed as: Ft = Ft o D

Ft

1

if | ut – Bt | > T

0

otherwise

=

(2)

1.

Segment the moving objects from the current frame by using rolling average background subtraction technique.

2.

Apply the close morphological operation with disk like structuring element on segmented areas to remove noise and other inaccuracies.

3.

If segmented_area (for each blob) >= T1 then

4.

num_blobs = num_blobs + 1 // Person counting

5.

if num_blobs > 1 then // overcrowding situation

6.

Find out the bounding boxes and centroids of the segmented blobs.

(3)

Here „T‟ is a threshold on the difference between a pixel in the current frame and background model. „ut‟ is the pixel value in the current frame and „Bt‟ is the pixel value in the background model. There are many thresholding methods in the literature which are grouped on the basis of information manipulated by the algorithm as in [22]. In our proposed technique we have used the clustering based approach.

7.

Calculate the standard deviation  of centroid in „n‟ consecutive frames.

8.

if „x‟ times (in continuity) standard deviation

 >= T2, then 9.

Unusual event occurred otherwise

10. Event is normal End if of steps 8, 5 & 3 respectively. In our experiments we have taken ‘x’ equal to 3. This value depends on the duration of fight occurred in the ATM. Through various experiments we deduce that this value is sufficient enough to detect fight even of small duration. After detecting unusual event the surveillance system can be automated in such a way that it automatically locks the door of that particular ATM and sends the alarming message to security personnel present in the common observation room so that necessary action could be taken. IV. EXPERIMENTS and RESULTS The proposed methodology is demonstrated using MATLAB 7.6(R 2008a) on AMD Dual-Core E-450 APU (1.65 GHz), 2 GB RAM and Windows 7 ultimate. We have taken videos in low resolution with the help of mobile camera in „3gp‟ format. We convert this „3gp‟ format to „avi‟ of 176x144 resolutions. The length of the sample1 and sample2 videos is 315 and 160 frames respectively. We have plotted the variation of standard deviation with respect to frames for sample1 and sample2 video in Figure 2 and 3 respectively. High spike in the graphs signifies that there is large movement of blobs, while small peeks signify less movement of blobs in the scene.

Figure 3. Variation of standard deviation in sample2 video

Results of our proposed technique on sample1, sample2, sample3, and sample4 videos are summarized in the table 1. In sample1, sample2 and sample3 videos overcrowding and theft with fighting occurred, this is detected by the system automatically. While in the sample4 video only overcrowding occurs. In all sample videos no false detection occurred. We have also tested this algorithm on other sample videos also, results are same. In the implementation of this algorithm the threshold values, learning rate parameter, shape and size of structuring element must be carefully chosen. TABLE 1. SUMMARY OF RESULTS

Video name

No of Frames

Source

Correct detection

False Detection

Sample1

315

Self-made

yes

nil

Sample2

160

Self-made

yes

nil

Sample3

967

YouTube

yes

nil

Sample4

408

Self-made

yes

nil

Some important snapshots of sample videos, column wise from top to bottom in chronological order of their appearance in the videos, are shown on the next page in the figure 4. Results show that as soon as the second person enters into the ATM, it is detected by the system in second row and if both persons fight with each other then it is also detected in the last row of each sample.

Figure 2. Variation of standard in sample1 video

Here X-axis shows the selected frame in which blob movements are more than a threshold (area threshold) value that is why figure 2 and 3 depicts lesser number (230, 110) of frames than actual (315, 160). This filtering avoids processing of useless frames. Line parallel to the x-axis represents the thresholding value used to detect the presence of abnormal event. Graphs show that standard deviation which lies above the thresholding line in continuity signify the presence of unusual event. Rectangular boxes in the graphs represent the occurrence of unusual event in the sample videos.

V. CONCLUSION and FUTURE WORK In this paper we have proposed an algorithm which is able to detect unusual event such as fight and overcrowding situation within the ATM of different banks. This proposed algorithm could be helpful to enhance the security of ATM. The results show that above algorithm efficiently applicable on low resolution video, only by applying some preprocessing. There is no need of using high computational scheme that enhance low resolution videos by super resolution techniques. In the future work this work can be extended to detect more uncommon events in ATM such as attempt to steal the ATM, Harming the ATM screen and theft within the ATM. This scheme could be further moderated for that situation where one ATM room consists of two or more ATM machines. The proposed algorithm is the basic scheme for

Sample1

Sample2

Sample3

Sample4

Figure 4. Simulation results of proposed approach on sample videos

detecting unusual event that may be further modified to deal with the challenges such as camouflage and sleeping person problems by using more efficient background subtraction techniques. REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

K.Srinivasan, K.Pokumaran, G.Sainarayan, “Improved Background Subtraction Techniques for Security in Video Application”, in Anti-counterfeiting, Security, and Identification in Communication, 2009, pp. 114-117. Sugandi, B., Hyoungseop Kim, Joo Kooi Tan, Ishikawa, “Tracking low resolution objects by metric preservation”, in Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1329-1336. Nan Jiang, Heng Su, Wenyu Liu, Ying Wu, “Tracking low resolution objects by metric preservation”, in Computer Vision and Pattern Recognition (CVPR), 2011, pp. 1329 – 1336. Y. Chen, Y. Rui, and T. Huang. “Multicue hmm-ukf for realtime contour tracking”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, pp. 1525 –1529. D. Cremers, “Dynamical statistical shape priors for level setbased tracking” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, pp. 1262 –1273. Kamijo, S., Ikeuchi, K. ; Sakauchi, M., “Traffic monitoring and accident detection at intersections”, in IEEE Transactions on Intelligent Transportation Systems, 2000, pp. 108-118. Tian Wang, Snoussi, H., “Histograms of Optical Flow Orientation for Visual Abnormal Events Detection”, in IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (AVSS), 2012, pp. 13-18. Lili Cui, Kehuang Li, Jiapin Chen, Zhenbo Li, “Abnormal event detection in traffic video surveillance based on local features”, in Image and Signal Processing (CISP), 2011, pp. 362-366.

[9]

[10]

[11] [12]

[13] [14]

[15]

[16]

[17]

[18]

Adam A., Haifa, Rivlin, E., Shimshoni, I., Reinitz, D. ,”Robust Real-Time Unusual Event Detection using Multiple FixedLocation Monitors”, in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, pp. 555-560. Burkey Birant Orten, “Moving Object Identification and Event Recognition in Video Surveillance Systems”, MS Thesis in Electrical and Electronics department in METU, 2005. [Online]. Available :http://wiki.eigenvector.com / index.php?title= Image_Pre-Processing_Methods Donovan H. Parks and Sidney S. Fels, “Evaluation of background subtraction Algorithm with Post-processing”, in IEEE Fifth International Conference on Advanced Video and Signal Based Surveillance, 2008, pp. 192 – 199. Alper Yilmaz, Omar javed and Mubarak Shah, “Object Tracking: A Survy”, ACM computing survey, 2008,volume 38, article 13. C.R. Wren, A. Azarbayejani, T. Darrell, and A. Pentland, “Pfinder: Real-Time Tracking of the Human Body,” in IEEE Trans. on Pattern Analysis and Machine Intelligence, July 1997, Vol. 19, pp. 780-785. Hu, W., Tan T., Wang L., Maybank S., " A Survey on Visual Surveillance of Object Motion and Behaviours", IEEE Transactions on Systems, Man, and Cybernatics, August 2004, Vol. 34, no. 3. N. Paragios and R. Deriche, “Geodesic active contours and level sets for the detection and tracking of moving objects”, in IEEE Trans. Pattern Anal. Machine Intell., 2000, pp. 266–280. Fujiyoshi, H., Lipton, A.J., “Real-time human motion analysis by Image skeletonization.”, in Applications of Computer Vision, 1998, pp.15- 21. M. Zane and T. Jules R, "Background Subtraction Survey for Highway Surveillance", in Proceedings of PRASA, 2009.

[19 ] Sugandi, B. ,“A Block Matching Technique for Object Tracking Based on Peripheral Increment Sign Correlation Image”, International Conference on Computer and Communication Engineering, 2008, pp. 113-117. [20] [Online]Available:http://en.wikipedia.org/wiki/Structuring_element [21] [Online] Available: http://en. Wikipedia . org/wiki / Thresholding_ (image _processing) [22] Mehmet Sezgin, Bulent Sankur, “Survey over image thresholding techniques and quantitative performance evaluation”, in Journal of Electronic Imaging, 2004, pp. 146–165. [23] [Online].Available:http://en.wikipedia.org/wiki/standard_deviation [24] Haritaoglu, I., D. Harwood and L.S. Davis, “W4: A Real-Time System for Detecting and Tracking People in 2 ½ D.”, in 5th European Conference on Computer Vision, 1998.

Suggest Documents