Personal Access Control System Using Moving Object Detection and Face Recognition Vesna Zeljković1, Du Zhang1, Ventzeslav Valev2*, Zhongyu Zhang1, Shengjie Zhu1, Junjie Li1 1 School of Engineering & Computing Sciences, New York Institute of Technology, Nanjing Campus, USA,
[email protected] 2
IAPR Fellow, School of Computing, University of North Florida, Florida, USA Various face recognition techniques have been developed such as image-based, video-based, appearance-based, model-based, 2D and 3D face recognition algorithms as well as many methods have been proposed for moving object detection and tracking based on edge, color, texture information; modeling both background and foreground with spatial-temporal reference data; nonparametric algorithms; various techniques for fixed background segmentation based on changing decision threshold, etc. Due to unpredictable characteristics of objects in blurry and foggy videos, because of various causes and reasons, the task of automatic moving object detection and tracking and face recognition remains very challenging task in video surveillance applications. The rest of the paper is organized as follows. Review of the most recent studies in the field of moving object detection and face recognition is presented in the second section. The third section describes the mathematical formulation of the proposed moving object detection and face recognition methods. The proposed techniques for moving object detection, face extraction and face recognition are applied and tested on real life video sequences and the obtained results are exposed in the fourth section followed by conclusive remarks in the final section.
Abstract — Real time automated personal access control system is proposed in order to detect the moving objects, localize, extract and recognize their faces in real image sequence. The described method encompasses two important issues in personal access control system that receives increased attention over years: moving object detection and face recognition. It is tested on personal access controlled area video testing. The efficiency of the described system is illustrated on four real world interior video sequences recorded in indoor/outdoor mixed environment with slight illumination changes. Keywords: Moving Object Detection, Moving Object Tracking, Face Recognition, Image Processing.
I INTRODUCTION Real time detection of moving objects from a video sequence, [1, 2], has become an important research topic in computer vision and video processing field and number of visual surveillance systems has greatly increased in recent years. Answering to the growing demand of computer vision tools for the last generations of consumer electronic devices equipped with smart cameras these systems have developed into intellectual systems that automatically detect, track, and recognize objects in video. There is a great interest in this kind of systems because of its wide application and huge spectra of use. The systems have practical application in traffic control systems, surveillance systems, robotics vision, securing different objects in interior or exterior and even more complex tasks such as human face recognition. Face recognition technology has received a great deal of attention over the decades in the field of image analysis and computer vision. It has been studied by scientists from various areas of psychophysical and computer sciences. Psychologists and neuroscientists analyze human perception and engineers study the computational aspects of face recognition using machine recognition of human faces. Even though face recognition represents natural human ability, the task to develop mathematical algorithm to perform face recognition is one of the most challenging tasks in computer vision.
II REVIEW OF MOVING OBJECT DETECTION AND FACE RECOGNITION METHODS In [3] is presented a real-time implementation of an optimized spatio-temporal nonparametric moving object detection method where the kernels’ bandwidths required to model the background are dynamically estimated, the background model is selectively updated and smart cooperation between a computer/device's central and graphics processing units and extensive usage of the texture mapping and filtering units of the latter, including a novel method for fast evaluation of Gaussian functions are implemented. Motion detection approach is presented in [4] based on the cerebellar-model-articulation-controller through artificial neural networks to detect moving objects in high and low bit-rate video streams. The proposed approach consists of a
*Associate member of the Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria, Email:
[email protected]
978-1-4799-5313-4/14/$31.00 ©2014 IEEE
662
probabilistic background generation module that produces a probabilistic background model through an unsupervised learning process over variable bit-rate video streams and a moving object detection module which is based on the Cerebellar Model Articulation Controller (CMAC) network and detects moving objects by implementing a block selection procedure and object detection procedure. A three-term low-rank matrix decomposition approach is proposed in [5] in which the turbulence sequence is decomposed into the background, the turbulence, and the object where this extremely difficult problem of simultaneous turbulence mitigation and moving object detection is simplified by minimization into nuclear norm, Frobenius norm, and 21 norm based on two observations: 1) the turbulence causes dense and Gaussian noise and therefore can be captured by Frobenius norm, while the moving objects are sparse and thus can be captured by 21 norm; 2) since the object's motion is linear and intrinsically different from the Gaussian-like turbulence, a Gaussian-based turbulence model can be employed to enforce an additional constraint on the search space of the minimization. The problem of moving object detection in aerial video where moving object is detected from moving background is addressed in [6]. The motion of the background is modeled by using the Gaussian mixture model framework and the optical flow between every two adjacent frames is computed to get the motion information for each pixel. The idea in [7] is to capture a series of video pictures at regular intervals used to describe the vector information of the region. The segmented frames are converted into color or grayscale images for better performance. Algorithm for moving object detection based on log Gabor filter and dominant eigen map approaches is described in [8] where moving object detected and tracked by connected component analysis and centroid manipulation. Method for moving object detection under a moving camera by utilizing the rank minimization framework, alignment and moving object detection is proposed in [9]. Region shrinking is applied on the region with high density of motion pixels in a binary difference image in [10] in order to detect and locate moving objects. In case of more than one object in an image power transformation is applied to enhance objects at different positions. The feature rectangle of an object is derived, which is used in further tracking. Design of a background image subtraction method and its Field Programmable Gate Arrays (FPGA) implementation for moving object detection in surveillance video applications with high resolution frames of 720×480 pixels is described in [11]. Moving object detection and retrieval model that integrates the spatial and temporal information in video sequences and uses the integral density method to quickly identify the motion regions in an unsupervised way is proposed in [12]. Key information locations in video frames are achieved as maxima and minima of the result of
difference of Gaussian function and motion map of adjacent frames is obtained from the diversity of the outcomes from simultaneous partition and class parameter estimation framework. The motion map filters key information locations into key motion locations where the existence of moving objects is implied. Besides showing the motion zones, the motion map also indicates the motion direction which guides the proposed integral density approach to quickly and accurately locate the motion regions. In [13] an algorithm is presented that derives fuzzy rules to merge the detected bounding boxes into a unique cluster bounding box that covers a unique object by defining the relationships of a pair of boxes by their box geometrical affinity, by their motion cohesion, and their appearance similarity. Review of different face recognition techniques is given in [14] and it is demonstrated that the performance of applied image processing technique is highly dependent on the type of pre-processing steps used and that equal error rates of the eigen-face and Fisher-face methods can be reduced. Authors in [15] study the influence of demographics on the performance and recognition accuracies of six different face recognition algorithms: three commercial, two non-trainable, and one trainable. Experimental results demonstrate that the matching accuracy for race/ethnicity and age cohorts can be improved by training exclusively on that specific cohort which leads to a dynamic face matcher selection scenario, where multiple face recognition algorithms (each trained on a different demographic cohort) are available for a selection to a biometric system operator based on the demographic information extracted from a probe image. It is shown that an alternative to dynamic face matcher selection is to train face recognition algorithms on datasets that are evenly distributed across demographics. In [16] a solution for illumination invariant face recognition for indoor applications is presented that uses active near infrared imaging system which is able to produce good condition face images regardless of visible lights in the environment; statistical learning algorithms which are used to extract most discriminative features from a large pool of invariant local binary pattern features based on which a highly accurate face matching engine is constructed; and a system that is able to achieve accurate and fast face recognition in practice. Adaptive approach to illumination invariant face recognition problem is presented in [17] which uses image quality to adaptively select fusion parameters for wavelet-based multi-stream face recognition and applies global and region illumination normalization procedures. Multi-resolution property of wavelet transforms for facial feature descriptors extraction at different scales and frequencies is utilized as it is shown that high-frequency
663
wavelet sub-bands provide illumination invariant face descriptors. In [18] a comparative study of several conventional face recognition methods like Principal Component Analysis (PCA) related to eigenfaces, Radial-Basis Function (RBF) and Novel Kernel Methods Like Kernel Principal Component Analysis (KPCA) and Support Vector Machine (SVM) are provided that are suitable to work properly as part of multimodal interface which interacts with Hybrid Broadcast Broadband Television (HBB-TV) user. The influence of noise and partial occlusion on face recognition accuracy is evaluated with special focus on occlusions of eyes and eyebrows. An automatic face recognition system has been designed in [19] that uses frontal images represented with gray level, Local Binary Pattern (LBP), Local Ternary Patterns (LTP), and two dimensional Gabor filter features and consists of an alignment process which includes face detection, eye detection, mapping of the center coordinates of the eyes to a standard face template and classification of aligned faces which is performed in a fully automatic manner.
A theory for constructing linear subspace approximations to face recognition algorithms is presented in [20] and it is empirically demonstrated the adequacy of the linear model, specified in terms of a linear subspace spanned by nonorthogonal vectors, using six different facerecognition algorithms, spanning template-based and feature-based approaches, with a complete separation of the training and test sets. Face recognition algorithm robust to large-scale changes in facial pose and lighting conditions is described in [21] that applies a 2D-to-3D face model and self-principal component analysis method based on bit-plane feature fusion.
III ALGORITHM FOR PERSONAL ACCESS CONTROL SYSTEM The proposed personal access control system is face recognition system that could have wide spectra of applications in video coding, video conference, crowd surveillance, human-computer interfaces, and controlled entrance to secured buildings and areas. The proposed personal access control system is shown in Figure 1.
Figure 1. Proposed personal access control system The input video is captured by fixed camera placed on the ceiling of the video surveilled area facing the entrance. Camera output is connected to CPU which contains the proposed system and processes recorded video.
time, see Figure 3. The derivative indicates the rate of changes in the frame’s content. If the derivative value is lower than predetermined threshold, it can be concluded that there is a moving object detected in the analyzed frame that is leaving the scene. Several frames before the one in which the moving object is identified are stored in order to facilitate the face recognition stage.
Personal access control system consists of three major blocks which perform moving object detection in the input video, face extraction in the detected person and finally face identification which gives the required attendance survey.
Face extraction phase is realized by face detection system described in [22] that uses the illumination insensitive features gained from the local successive mean quantization transform features and the rapid detection achieved by the split up sparse network of Winnows classifier which represents a learning architecture that is specifically tailored for learning in the presence of a very large number of features and can be used as a general purpose multi-class classifier. The local successive mean quantization transform features are applied for illumination and sensor insensitive operation in object recognition, a split up sparse network of Winnows is used to speed up the original classifier and finally, the features and classifier are combined for the task of frontal face detection.
We propose moving object detection algorithm that is resistant to slight illumination changes. Every tenth frame is processed due to enabling real time implementation of the proposed personal access control system. Every tenth frame is subtracted from the background frame in the video which usually represents the first frame in the recorded video. Background frame does not contain any moving objects. The elapsed time between subtracted frames is chosen empirically. This parameter corresponds to the maximum speed of the moving objects, their distance from the camera, and frame frequency. The derivative of the mean value is calculated for every subtraction frame and it is analyzed in
664
After the face is extracted from detected moving object, face detection procedure is implemented by applying principal component analysis procedure as described in [23-25].
concluded that there is a moving object detected in the analyzed frame that is leaving the scene.
M images of m x n resolution comprise the face training set where every face image is represented as m*n=sdimensional vector. Principal component analysis finds a new t-dimensional subspace whose basis vectors correspond to the maximum variance direction in the original face image space. This new subspace is lower dimensional than the initial face images space, i.e. t has much lower value than value s and is called face space. All images of known faces are projected onto the face space to find sets of weights that describe the contribution of each vector and an unknown face image is also projected onto the face space to obtain its set of weights. The unknown face image is identified by comparing a set of weights for the unknown face to sets of weights of known faces. If we consider the image elements as random variables, the principal component analysis basis vectors are defined as eigenvectors of the scatter matrix ST that is defined as: M
a)
c)
S T xi xi
T
b)
d)
Figure 2. Mean value of the subtracted images observed in time for videos: a) Video A; b) Video B; c) Video C and d) Video D.
i 1
where µ is the mean of all face images in the training set and xi is the i-th image with its columns concatenated in a vector. The projection matrix WPCA is composed of t eigenvectors corresponding to t largest eigenvalues, thus creating a tdimensional face space. Since these eigenvectors, that represent principal component analysis basis vectors, look like some ghostly faces they were conveniently named eigenfaces. The output of the face recognition algorithm gives the detected person’s identification if her/his image is contained in the face database.
a)
b)
c)
d)
IV SIMULATION RESULTS The simulation results were performed using Visual C++ and MATLAB. The proposed personal access control system is tested on four video sequence recorded by fixed camera placed on the ceiling facing the entrance in indoor/outdoor mixed environment with slight illumination changes that occur in videos. In the moving object detection phase every tenth frame is subtracted from the background frame in the video which represents the first frame in the recorded video and does not contain any moving objects. The mean value is calculated for every subtraction frame, see Figure 2. The derivative of the mean value is calculated for every subtraction frame and it is analyzed in time, see Figure 3. The derivative indicates the rate of changes in the frame’s content. If the derivative value is lower than predetermined threshold, it can be
Figure 3. Derivative of mean value of the subtracted images observed in time for videos: a) Video A; b) Video B; c) Video C and d) Video D. Observing the peaks in Figure 3 lower than the preset threshold it can be concluded that there are 6 persons detected in Videos A and C, 3 persons detected in Video B
665
who entered the video surveilled area and 4 persons detected in Video D. Percentage of the correct moving object detection algorithm is 100% for all tested videos and is presented in Table 1. The output of the moving object detection phase, i.e. detected moving objects is presented in Figure 4 for all four videos. The frame where the moving object is detected is stored as well as several frames before the one in which the moving object appeared in order to facilitate the face recognition stage, see Figure 5. The face extraction rate is given in Table 1 for four consecutive frames. The percent of the correct face extraction is above 91% which is good enough to give the highest percent of the face recognition algorithm. The output of the face extraction phase is presented in Figure 5 for five consecutive frames for all four tested videos. c)
a) d) Figure 4. The output of the moving object detection phase for video: a) Video A; b) Video B; c) Video C and d) Video D.
b)
666
a)
d) Figure 5. The output of the face extraction phase for video: a) Video A; b) Video B; c) Video C and d) Video D. Experiments described in [23] show that recognition performance decreases dramatically as the detected face image resolution is not the same as the one of the face images stored in the database. The percent of correct classification averaged for various face image sizes reported in [23] is 64%. This is understandable as under size changes, the correlation from one image to another is largely lost, unlike under various illumination conditions. This imposes the solution of the multi-scale approach where the faces of particular size are compared to one another. One way of realizing this approach is that the database contains face images of every individual of several different sizes.
b)
Given this indication from the literature, we tested PCA face recognition algorithm under various face image sizes and using two different metrics city block and Euclidian for distance measurement. The obtained results are listed in Table 2. It can be observed from Table 2 that Euclidian distance and 60x60 image resolution give the highest face recognition rate. Table 1 Summary of the obtained results of the classification
Video A B C D c)
667
Percent of correct classification moving object detection 100 100 100 100
Percent of correct classification face extraction 96.67 100 91.67 100
Percent of correct classification face recognition 100 100 100 100
Table 2 Summary of the obtained results of face recognition under various face image sizes Correct rate of recognition Video A
Video B
Video C
Video D
Distance measurement
Distance measurement
Distance measurement
Distance measurement
size
City Block
Euclidean
City Block
Euclidean
City Block
Euclidean
City Block
Euclidean
45x45
83.33
83.33
66.67
66.67
100
100
100
100
50x50
83.33
83.33
66.67
100
50
83.33
75
100
55x55
83.33
83.33
66.67
100
83.33
83.33
100
100
60x60
100
100
66.67
100
100
100
100
100
65x65
83.33
83.33
66.67
100
83.33
83.33
100
100
70x70
83.33
83.33
66.67
100
83.33
83.33
100
100
Image
Table 1 represents the summary of the obtained results of the proposed automated personal access control system for all three major phases: moving object detection, face extraction and face recognition. It can be concluded that the moving object detection and face recognition phase with the parameters chosen by analyzing Table 2, give 100% successful classification. Face extraction stage scores above 91% which is good enough to achieve the highest final results of 100%.
justifies our future research in this direction and the effort to make the proposed system competitive.
REFERENCES
V CONCLUSION Real time personal access control system based on moving object detection and face recognition is proposed. It is illustrated on the personal access controlled area video testing. It consists of three major stages: the moving object detection, face localization and extraction and face recognition in real image sequences, which gives the required attendance survey. Novel moving object detection algorithm resistant to slight illumination changes processes every tenth frame and subtracts it from the background. The derivative of the mean value is calculated for every subtraction frame, it indicates the rate of changes in the frame’s content and successfully detects the person who entered the video surveilled space. Face extraction phase is realized by face detection system using the illumination insensitive features gained from the local successive mean quantization transform features and the rapid detection achieved by the split up sparse network of Winnows classifier. After the face is extracted from detected moving object, face detection procedure is implemented by applying principal component analysis procedure. The efficiency of the described system is illustrated on four real world interior video sequences recorded in indoor/outdoor mixed environment with slight illumination changes. We obtain very high recognition rate which
668
[1]
V. Zeljkovic, “Video Surveillance Techniques and Technologies”, IGI Global Hershey PA, USA, 2013, ISBN: 978-1-4666-4896-8, http://www.igi-global.com/book/video-surveillance-techniquestechnologies/78939.
[2]
V. Zeljkovic, “Illumination Independent Moving Object Detection in Image Sequences”, LAP Lambert Academic Publishing GmbH & Co. KG, 2010, ISBN: 978-3-8433-5943-6, http://www.bookdepository.co.uk/Illumination-Independent-MovingObject-Detection-Image-Sequences-VesnaZeljkovic/9783843359436.
[3]
Berjon D., Cuevas C., Moran F., Garcia N., “GPU-Based Implementation of an Optimized Nonparametric Background Modeling for Real-Time Moving Object Detection”, IEEE Transactions on Consumer Electronics, Vol. 59 , Issue 2, 2013 , pp. 361–369.
[4]
Shih-Chia Huang, Bo-Hao Chen, “Highly Accurate Moving Object Detection in Variable Bit Rate Video-Based Traffic Monitoring Systems”, IEEE Transactions on Neural Networks and Learning Systems, Vol. 24, Issue 12, 2013, pp. 1920–1931.
[5]
Oreifej O., Xin Li, Shah M., “Simultaneous Video Stabilization and Moving Object Detection in Turbulence”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, Issue 2, 2013, pp. 450-462.
[6]
Yunfei Wang, Zhaoxiang Zhang, Yunhong Wang, “Moving Object Detection in Aerial Video”, 11th International Conference on Machine Learning and Applications, Vol. 2, 2012, pp. 446-450.
[7]
Rajagur D., Manimuthu S.D., Rajkamal A., Malik H.M., “Moving Object Detection Using Drawpad”, International Conference on Advances in Engineering, Science and Management, 2012, pp. 39-42.
[8]
Krishna M.T.G., RaviShankar M., Babu, R., “Log-DEM: Log Gabor Filter and Dominant Eigen Map Approaches for Moving Object Detection”, 12th International Conference on Intelligent Systems Design and Applications, 2012, pp. 568-573.
[9]
Sang-Woo Noh, Tae-Hyun Oh, In “Moving Object Detection under Moving Camera
So by
Kweon, Rank
Minimization”, 9th International Conference on Ubiquitous Robots and Ambient Intelligence, 2012, pp. 586-587.
[17] Sellahewa H., Jassim S.A., “Image-Quality-Based Adaptive Face Recognition”, IEEE Transactions on Instrumentation and Measurement, Vol. 59, Issue: 4, 2010, pp. 805–813.
[10] Zhihui Li, Haibo Liu, Di Sun, “Moving Object Detection and Locating Based on Region Shrinking Algorithm”, International Conference on Mechatronics and Automation, 2012, pp. 2515-2518.
[18] Jozer B., Matej F., Lubos O., Milos O., Jarmila P., “Face Recognition under Partial Occlusion and Noise”, IEEE EUROCON, 2013, pp. 2072–2079.
[11] Lopez-Bravo A., Diaz-Carmona J., Ramirez-Agundis A., PadillaMedina A., Prado-Olivarez J., “FPGA-Based Video System for Real Time Moving Object Detection”, International Conference on Electronics, Communications and Computing, 2013, pp. 92-97.
[19] Yavuz H.S., Cevikalp H., Edizkan, R., “Automatic Face Recognition from Frontal Images”, 21st IEEE Signal Processing and Communications Applications Conference, 2013, pp. 1–4.
[12] Dianting Liu, Mei-Ling Shyu, “Effective Moving Object Detection and Retrieval via Integrating Spatial-Temporal Multimedia Information”, IEEE International Symposium on Multimedia, 2012, pp. 364-371.
[20] Mohanty P., Sarkar S., Kasturi R., Phillips P.J., “Subspace Approximation of Face Recognition Algorithms: An Empirical Study”, IEEE Transactions on Information Forensics and Security, Vol. 3, Issue 4, 2008, pp. 734–748.
[13] Jilin Tu, Del Amo A., Yi Xu, Li Guan, Mingching Chang, Sebastian T., “A Fuzzy Bounding Box Merging Technique for Moving Object Detection”, Annual Meeting of the North American Fuzzy Information Processing Society, 2012, pp. 1-6.
[21] Yi Dai, Guoqiang Xiao, Kaijin Qiu, “Efficient Face Recognition with Variant Pose and Illumination in Video”, 4th IEEE International Conference on Computer Science & Education, 2009, pp. 18–22.
[14] Teja G.P., Ravi S., Face Recognition Using Subspaces Techniques”, International Conference on Recent Trends In Information Technology, 2012, pp. 103–107.
[22] Nilsson M., Nordberg J., Claesson I., "Face Detection using Local SMQT Features and Split up Snow Classifier", IEEE International Conference on Acoustics, Speech and Signal Processing, 2007, Vol. 2, pp. 589-592.
[15] Klare B.F., Burge M.J., Klontz J.C., Vorder Bruegge R.W., Jain A.K.. “Face Recognition Performance: Role of Demographic Information”, IEEE Transactions on Information Forensics and Security, Vol. 7, Issue 6, 2012, pp. 1789–1801.
[23] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, 1991, Vol. 3, No. 1, pp. 71-86. [24] Turk M.A., Pentland A.P., "Face Recognition Using Eigenfaces", IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1991, pp. 586 - 591.
[16] Li S.Z., Ru Feng Chu, Shengcai Liao, Lun Zhang, “Illumination Invariant Face Recognition Using Near-Infrared Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29 , Issue 4, 2007, pp. 627–639.
[25] Delac K., Grgic M., Grgic S., "Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set", International Journal of Imaging Systems and Technology, Vol. 15, Issue 5, 2006, pp. 252260.
669