Automated real-time video surveillance algorithms for ... - IEEE Xplore

0 downloads 0 Views 486KB Size Report
Automated Real-Time Video Surveillance Algorithms for SoC Implementation: A Survey. Ehab Salahat, Advisor: Hani Saleh, Co-Advisor: Baker Mohammad, ...
Automated Real-Time Video Surveillance Algorithms for SoC Implementation: A Survey Ehab Salahat, Advisor: Hani Saleh, Co-Advisor: Baker Mohammad, Co-Advisor: Mahmoud Al-Qutayri, Co-Advisor: Andrzej Sluzek and Co-Advisor: Mohammad Ismail Department of Electrical Engineering, Khalifa University of Science, Technology and Research, Abu Dhabi E-mails: [email protected], [email protected], [email protected], [email protected], [email protected] and [email protected] extract information from large scale real-time data. Hardwarebased (or hardware-supported) implementation of key point detection and description has been recently attracting interests of the research community (e.g. see [2-4]). In this work, we investigate possible algorithms for object detection and recognition. We also compare between these algorithms for automated real-time video surveillance SoC implementation based on many performance metrics. This is then used to establish the focus of our research in this field. The remaining part of this paper is structured as follows: in section II, the investigated performance metrics and challenges are presented. In section III, the different candidate algorithms are overviewed; highlighting their merits and demerits. Finally, the paper findings are summarized in section IV.

Abstract—Numerous techniques and algorithms have been developed and implemented, primarily in software, for object tracking, detection and recognition. A few attempts have been made to implement some of the algorithms in hardware. However, those attempts have not yielded optimal results in terms of accuracy, power and memory requirements. The aim of this paper is to explore and investigate a number of possible algorithms for real-time video surveillance, revealing their various theories, relationships, shortcomings, advantages and disadvantages, and pointing out their unsolved problems of practical interest in principled way, which would be of tremendous value to engineers and researchers trying to decide what algorithm among those many in literature is most suitable to specific application and the particular real-time System-on-Chip (SoC) implementation. Index Terms—Real-Time Video Surveillance; Maximally Stable Extremal Regions; Scale-Invariant Feature Transform, Speeded Up Robust Features; Background Subtraction; FPGA.

II. CHALLENGES AND PERFORMANCE METRICS Performance is impacted by many variables, making exhaustive testing of all use cases virtually impossible. A clear understanding of all these variables and their impact is crucial to design a successful and meaningful test for the system. Key factors influencing performance of real-time object detection/recognition system include, but not limited to the points discussed below [5].

I. INTRODUCTION

V

ISUAL surveillance of unknown and dynamically changing environments is one of the most challenging applications of machine vision. This active research area invariably requires high-performance computation resources due to the volume of data to be processed. Video surveillance has a wide range of applications both in public and private environments, such as homeland security, crime prevention, traffic control, accident prediction and detection, and monitoring patients, UAV-based surveillance, airports surveillance systems, etc…). There is an increasing interest in video surveillance due to the growing availability of cheap sensors and processors, and also a growing need for safety and security of the public [1]. Thus, in mobile and autonomous applications, energy efficiency and accuracy of the system are crucial requirements. For speedy actions, the operations must be performed in real-time speed. Therefore, advanced vision modules for field systems are often based on energy efficient embedded video processors supporting surveillance, tracking and other processing needs for various applications. Inevitably, next generations of visual surveillance systems would incorporate more intelligent algorithms which could automatically detect/identify objects from dynamically updated visual databases (including detection of unspecified objects previously seen in the database images). Such systems would be effectively performing visual data matching and retrieval. Researchers are need to develop intelligent systems to efficiently

978-1-4799-2452-3/13/$31.00 ©2013 IEEE

A. Camera and Constant Environment Parameters The first challenge is mainly about physical installation of the video stream. First, the view of the video streaming source, i.e., the camera, is finite and limited by scene structures [6]. Other camera parameters include the quality and properties of the video as generated by the camera (color, grey-scale, low-light, infrared, resolution, pixel depth, and frame-rate). B. Variable Environment Parameter These include illumination levels, reflections, weather conditions (sun, cloud, rain, snow, fog, and wind), seasonal changes, nuisance targets, lights, and shadows [5]. C. Processing Environment Parameters Factors such as limited memory, processing power and speed in embedded processing platforms like Digital Signal Processors (DSPs) and Field Programmable Gate Arrays (FPGAs) result in algorithmic changes that can impact the performance of the individual components and the whole system as well [5].

82

D. Communication Parameters Mainly, the bandwidth, synchronization, cameras coordination, compression noise and artifacts, potentially dropping frames and transmission errors.

However, detection and extraction are computationally demanding and therefore can’t be used in systems with limited computational power [11].

E. Application Parameters Application parameters include targets of interest (humans, vehicles, shopping carts, works of art, etc.), tolerable miss detection and false alarm rates and their desired trade off (e.g. whether high detection rate or low false alarm rate is more important), type of application (e.g. security, business, etc…), maximum allowable latency, learning and self-adaptation [5].

The proposed visual chip is needed because available designs are not targeting affine-invariant key point detectors and descriptors on a single chip. We aim to design a dedicated SoC using ASIC design with advance technology, implementing complete visual data processing mobile system, and employing the state of the art memory-efficient and power reduction and/or saving designs. The need for these surveillance system includes access control in security-sensitive locations such as military bases governmental, crowd flux statistics, congestion analysis and traffic flow and management, anomaly detection and alarming to identify abnormal behavior …etc. The proposed SoC will replace the passive operator need, that all the preceding applications require, and promisingly will accomplish high accuracy without human-interaction with this system.

IV. RESEARCH FOCUS

III. CANDIDATE ALGORITHMS As indicated in the previous section, many challenges and performance metrics affect surveillance systems. The choice of the optimal algorithm can enhance the performance and help in resolving these challenges. Many object detection algorithms seems to be excellent candidates (e.g., Difference-of-Gaussians (DoG), Maximally Stable Extremal Regions (MSER), Fully Affine Invariant Feature Detector (FIAF), Scale Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), Background Subtraction, etc…)[6-8], differing in their capabilities and requirements. Some potential algorithms for real-time video surveillance SoC designs are briefly introduced.

V. CONCLUSION Real-time video surveillance is an active research area with a wide spectrum of promising applications. In this short paper, we presented a brief overview of some potential algorithms for object recognition, and compared their pros and cons. Motivated by the limitations of the current system designs, we aim to design a mobile power-efficient real-time SoC for video surveillance to resolve these limitations.

A. Background Subtraction Background subtraction is widely used for detecting moving objects from static cameras. By estimating the background, it can then subtract it from the input frame, by applying some threshold value, to get the foreground, i.e., the object. Different techniques could be used to estimate the background, the simplest assumes the background to be the previous frame, and another possibility is to apply a mean/median filter for the last N frames, and assuming the background to be the result. This algorithm is adaptive to dynamic background changes, easy to implement and fast and applicable for real-time implementation, as in [6-7]. However, the drawbacks are its dependency on the object speed, frame rate, huge memory, and most importantly, the threshold used is neither global nor time-invariant.

REFERENCES [1]

X. Wang, “Intelligent multi-camera video surveillance: A review,” Pattern Recognition Letters, vol. 34, no. 1, 2013. [2] E. S. Kim and H.-J. Lee, "A Practical Hardware Design for the Keypoint Detection in the SIFT Algorithm with a Reduced Memory Requirement," IEEE Int. Symp.on Circuits and Systems ISCAS 2012., pp.770-773, May 2012. [3] V. Bonato, E. Marques, and G. A. Constantinides, “A Parallel Hardware Architecture for Scale and Rotation Invariant Feature Detection,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 12, pp. 1703-1712, Dec. 2008. [4] Qi Zhang, Huron Chen, Yimin Zhang, Yinlong Xu, "SIFT implementation and optimization for multi-core systems," IEEE Symp. IPDPS 2008, pp.1-8, April 2008. [5] P. L. Venetianer and H. L. Deng, “Performance evaluation of an intelligent video surveillance system—A case study,” Comput. Vis. Image Understanding, vol. 114, no. 11, pp. 1292–1302, 2010. [6] A. McIvor, “Background subtraction techniques,” in Proc. of Image and Vision Computing, New Zealand, November, 2000. [7] M. Piccardi, “Background subtraction techniques: a review,” in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, vol. 4, (The Hague, The Netherlands), October 2004. [8] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust Wide Baseline Stereo from Maximally Stable Extremal Regions,” Proc. 13th British Machine Vision Conf., pp. 384-393, 2002. [9] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors”, PAMI, IEEE Transactions on Pattern Analysis and Machine Intellegence, vol. 27, no. 10, pp. 1615-1630, Oct. 2005. [10] F. Kristensen, W. J. MacLean,“Real-Time Extraction of Maximally Stable Extremal Regions on an FPGA,” 2007 IEEE Int. Symp.on Circuits and Systems ISCAS, pp 165-168, May, 2007. [11] H. Bay, T. Tuytelaars, and L. J. Van Gool. SURF: Speeded Up Robust Features. In ECCV, pages 404–417, 2006.

B. Maximally Stable Extremal Regions The MSER algorithm is an interest region detector originally used in wide-baseline stereo matching [8]. MSER operates on the input image directly without any smoothing, which results in detection of both fine and coarse structures. It’s shown in [9] MSER performs well compared to other local detectors. The main advantages of the MSER detection is that it’s the fastest affine invariant region detector. To the best of our knowledge, the only drawback of the MSER is that its performance degrades with blurred images, which can be resolved using smart installation of the camera topology [8-10]. C. Speeded Up Robust Features SURF is a scale- and rotation-invariant interest point detector and descriptor [11]. The algorithm extracts salient points from the image and computes descriptors of their surroundings that are invariant to scale, rotation and illumination changes.

83

Suggest Documents