Survey on Human Activity Recognition Techniques For Video ... - ijcst

IJCST Vol. 6, Issue 3, July - Sept 2015

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print)

Survey on Human Activity Recognition Techniques For Video Surveillance 1

Utkal Sinha, 2Himanish Shekhar Das, 3Mayank Shekhar

M.Tech, Computer Science and Engineering, NIT Rourkela, Rourkela, India M.Tech, Computer Science and Engineering, NIT Durgapur, Durgapur, India 3 M.Tech, Computer Science and Engineering, Tezpur Central University, Tezpur, India 1

2

Abstract Activity recognition aims to recognize the actions and goals of one or more agents from a series of observations on the agents’ actions and the environmental conditions. Vision-based activity recognition is a very important and challenging problem to track and understand the behavior of agents through videos taken by various cameras. The primary technique employed is computer vision. Vision-based activity recognition has found many applications such as human-computer interaction, user interface design, robot learning, and surveillance, among others. This report describes the literature served for developing human activity recognition algorithm and a novel approach to it. Keywords Computer Vision, Human Computer Interaction, Human Activity Recognition, Visual Surveillance. I. Introduction Understanding human activity by using computer vision is a complex and challenging aspect of both machine learning and pattern recognition. As cited in [1], based on 2006 statistics, the New York City Transit System [2] is the busiest metro system in the U.S. with a total of 468 stations and 1.49 billion riders a year, that is, 4.9 million riders a day and according to 2007 statistics, the Moscow metro [3] is the busiest metro in Europe with 176 stations and 2.52 billion riders annually that is 9.55 Million riders daily. However, these transit systems are spread across hundreds of kilometers and would require thousands of employees for daily operations. Deployment of visual surveillance to cover a system of this magnitude requires thousands of cameras, which makes human-based surveillance unfeasible for practical purposes. With increase in volume of video data, existing digital video-surveillance systems supports only to capture, store, and distribute video while exclusively leaving the task of threat detection to human operators. According to psychophysical research [4], humans have severe limitations in the ability to monitor simultaneous signals. So a need for computerized searching of videos and surveillance is inevitable which led to the area of activity recognition using computer vision. II. Literature Survey Numerous attempts have been made in this field to automatize video surveillance but each and every approaches has its own pros and cons. Table 1 from [1] shows the spectrum of such approaches. Table 1: Modeling Framework, Learning Algorithms and Techniques Modified Altruistic Vector Quantization algorithm Markov Model Based Explicit State Duration ESD-HMM w w w. i j c s t. c o m

Semi-supervised HMM with Bayesian adaptation Gaussian mixture model (+/-) HMM Continuous-state HMM Coupled HMM Indefinite Hidden Markov Model (iHMM) Multi-observation HMM (MOHMM) Markov Random Field (MRF) Probabilistic Principal Component Analyzers (MPPCA) Bayesian Model (and classifier) Dynamic Bayesian Network (DBN) Clustering (including hierarchical) Poisson model Graph-cuts Normalized cut Fuzzy k-means k-medoids Stochastic Fuzzy c-means Leader-and-follower Co-occurrence statistics and bipartite co-clustering Undirected graphical model Dynamic Oriented Graph (DOG) Support Vector Machines (SVM) Multiple Support Vector Machines (SVM) One-class SVM Neural Network Fuzzy Distance based function Similarity function Probabilistic Latent Semantic Allocation (pLSA) Two stage hierarchical pLSA Finite State Machines for sequence grammars Fuzzy Associative Memory (FAM) Principal Components Analysis (PCA) Self-Organizing Matrix (SOM) Social Force ModelMulti-Class Delta Latent Dirichlet Allocation (MCLDA) Markov Clustering Topic Model (MCTM) Hierarchical Dirichlet Allocation (HDP) Non-model based On the basis of prior knowledge and human involvement in the learning process, the research in human activity recognition can be categorized as supervised, unsupervised and semi supervised. A. Supervised Learning In these type of learning, a number of models of normal or International Journal of Computer Science And Technology 9



abnormal behavior are built based on the labeled training samples [5-8]. Video samples which does not fit any model are classified as abnormal. But this approach is limited to only events that are well defined and would require sufficient training data. However, real world video samples would mostly contain events that are not well defined and such events are rare and hence sufficient training samples are not available. B. Unsupervised Learning In these type of learning, a number of models of normal or abnormal behavior are built based on the labeled training samples [5-8]. Video samples which does not fit any model are classified as abnormal. But this approach is limited to only events that are well defined and would require sufficient training data. However, real world video samples would mostly contain events that are not well defined and such events are rare and hence sufficient training samples are not available. C. Semi-supervised Learning In these type of learning, a number of models of normal or abnormal behavior are built based on the labeled training samples

[5-8]. Video samples which does not fit any model are classified as abnormal. But this approach is limited to only events that are well defined and would require sufficient training data. However, real world video samples would mostly contain events that are not well defined and such events are rare and hence sufficient training samples are not available. Human group behavior detection has attracted increasing research interests. From the security perception, the automatic detection of group activities is very important. Some of the issues for group event detection as mentioned in [20] are as follows: 1. Group Event Detection with a Varying Number of Group Members. 2. Group Event Detection with a Hierarchical Activity Structure. 3. Clustering with an Asymmetric Distance Metric. III. Latest Research Work in Human Behavior Recognition Table 2 shows a brief summary of latest research works in the field of human behavior recognition.

Table 2 TITLE Robust Video Surveillance for Fall Detection Based on Human Shape Deformation

Visual Event Recognition in Videos by Learning from Web Data

DESCRIPTION Seniors citizens in the developed countries need to establish new healthcare systems to ensure the safety of elderly people at home. This research work introduced a novel approach of human fall detection by analyzing human shape deformation in a video sequence. On detection of any abnormal activity, this system alarm signal toward an outside resource via a cell phone or internet. This paper introduces a visual event recognition framework by learning from a large amount of videos that are available on the web (for example from YouTube). The authors first developed a new approach to measure the distance between any two video clips called Aligned SpaceTime Pyramid Matching (ASTPM). Then they developed a new transfer learning method called Adaptive Multiple Kernel Learning (A-MKL).

Real world complex activities typically consists of multiple primitive events that are happening either parallel or sequentially over Modeling Temporal a span of time. Detection and understanding Interactions with of such activities would require recognizing Interval Temporal both individual event and their spatiotemporal Bayesian Networks dependencies over different time intervals. To for Complex achieve this goal to a certain extent a novel Activity Recognition approach called Interval Temporal Bayesian Network (ITBN) have been developed in this research work. On-Line Video Event Detection by Constraint Flow

10

It presents a new algorithm to describe and detect composite video events online based on combinatorial optimization. So, initially a constraint flow is generated automatically by a parsing algorithm and then the composite event detection algorithm is applied to best interpret the video with respect to the constraint flow.

International Journal of Computer Science And Technology

LIMITATIONS

REFERENCE

Performance of such system will degrade when the movement of the subject after fall is less.

[21]

Transfer learning or domain adaptation or cross-domain learning is an emerging research area in computer vision. The proposed system recognize videos in the auxiliary domain but for that it needs more labeled samples. Hence, this would result in increased complexity.

[22]

This approach lacks automatically identifying the underlying topics.

[23]

The proposed on-line event detection algorithm reduces the search space dramatically by using dynamic programming. However, still the searching time would be significant which makes this system less suitable for real time applications.

[24]

w w w. i j c s t. c o m


IV. Conclusion Although numerous attempts and various novel approaches have been made in this area, but designing an algorithm that would work with minimum error in unconstraint environment is still an open area of research. References [1] D. B. S. Joshua Candamo, Matthew Shreve, R. Kasturi, “Understanding transit scenes: A survey on human behaviorrecognition algorithms”, In IEEE Transactions On Intelligent Transportation Systems, Vol. 10, No. 1, March 2010, pp. 206–224. [2] “Official website for metropolitan transportation authority”, [Online]. Available: http://www.mta.info/ [3] “Official website for moscow metro.” [Online]. Available: http://www.mosmetro.ru [4] D. G. N. Sulman, T. Sanocki and R. Kasturi, “How effective is human video surveillance performance?” in Proc. Int. Conf. Pattern, vol. 10, no. 1, 2008, p. 13. [5] K. L. K. a. X. Y. Chen, G. Liang, “Abnormal behavior detection by multi-svm-based bayesian network,” in Proc. Int.Conf. Inf. Acquisition, vol. 9, no. 11, July 2007, pp. I–341I–344. [6] M. I. O. Boiman, “Detecting irregularities in images and in video,” in Proc. 10th IEEE Int. Conf. Comput. Vision, vol. 1, 2005, p. 462-469. [7] T. J. L. v. G. F. Nater, H. Grabner, “Tracker trees for unusual event detection,” in Proc. IEEE 12th Int. Conf. Comput. Vision Workshops, 2009, p. 1113-1120. [8] F. Lv and R. Nevatia, “Single view human action recognition using key pose matching and viterbi path searching,” in Proc. IEEE Conerence. Computer Vision Pattern Recognition, 2007, pp. 1–8. [9] S. Y. Y. Zhou and T. S. Huang, “Detecting anomaly in videos from trajectory similarity analysis,” in Proc. IEEE Int. Conf. Multimedia Expo., 2007, p. 1087-1090. [10] S. B. A. Choudhary, M. Pal and S. Chaudhury, “Unusual activity analysis using video epitomes and plsa,” in Proc. 6th Indian Conf. Computer Vision, Graphics Image Process, 2008, p. 390-397. [11] J. Y. V. W. Z. Q. Y. D. H. Hu, X. Zhang, “Abnormal activity recognition based on hdp-hmm models,” in Proc. IJCAI, 2009, p. 1715-1720. [12] J. Varadarajan and J. Odobez, “Topic models for scene analysis and abnormality detection,” in Proc. IEEE 12th Int. Conf. Comput. Vision Workshops, October 2009, p. 13381345. [13] Y. G. X. Zhang, H. Liu and D. H. Hu, “Detecting abnormal events via hierarchical dirichlet processes,” in Proc. 13th Pacific-Asia Conf. Knowledge Discovery Data Mining, April 2009, p. 278-289. [14] S. G. T. Xiang, “Video behavior profiling for anomaly detection,” in IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 5, May 2008, p. 893-908. [15] ——, “Incremental and adaptive abnormal behaviour detection,” in Comput. Vis. Image Understanding, vol. 111, 2008, p. 5973. [16] W. E. L. G. X. Wang, X. Ma, “Unsupervised activity perception in crowded and complicated scenes using hierarchical bayesian models,” in IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3, March 2009, p. 539-555. w w w. i j c s t. c o m


[17] S. B. I. M. D. Zhang, D. Gatica-Perez, “Semisupervised adapted hmms for unusual event detection,” in Proc. IEEE Comput. Soc. Conf. Computer Vis. Pattern Recognition, vol. 1, 2005, p. 611-618. [18] F. A. H. M. Jager, C. Knoll, “Weakly supervised learning of a classifier for unusual event detection,” in IEEE Trans. Image Processing, vol. 17, no. 9, september 2008, p. 1700-1708. [19] B. B. X. Zou, “Anomalous activity classification in the distributed camera network,” in Proc. 15th IEEE Int. Conf. Image Processing, 2008, p. 781-784. [20] R. P. Z. Z. Weiyao Lin, Ming-Ting Sun, “Group event detection with a varying number of group members for video surveillance,” in IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 20, no. 8, August 2010. [21] A. S.-A. J. R. Caroline Rougier, Jean Meunier, “Robust video surveillance for fall detection based on human shape deformation,” in IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, vol. 21, no. 5, May 2011, p. 611-622. [22] I. W.-H. T. J. L. Lixin Duan, Dong Xu, “Visual event recognition in videos by learning from web data,” in IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 34, no. 9, September 2012, p. 1667-1680. [23] E. S.-N. L. Z. W. Q. J. Yongmian Zhang, Yifan Zhang, “Modeling temporal interactions with interval temporal bayesian networks for complex activity recognition,” in IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 35, no. 10, October 2013, p. 2468-2483. [24] J. H. H. Suha Kwak, Bohyung Han, “On-line video event detection by constraint flow,” in IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, vol. 36, no. 6, June 2014, p. 1174-1186. Utkal Sinha, he is a postgraduate scholar ( M.Tech ) at the National Institute of Technology Rourkela. He has completed his B.Tech in computer science and engineering from the National Institute of Technology Silchar. His area of interest includes cloud computing, image processing and computer vision and machine learning. Himanish Shekhar Das, he has received his M.Tech degree from the National Institute of Technology Durgapur. He completed his B.Tech in computer science and engineering from the National Institute of Technology Silchar. His area of interest includes wireless sensor networks and image processing.

International Journal of Computer Science And Technology 11



Mayank Shekhar, he has received his M.Tech degree from Tezpur Central University, Tezpur. He completed his graduation in computer science and engineering from the National Institute of Technology Silchar. His area of interest includes distributed and cloud computing, wireless sensor networks and image processing.

12

International Journal of Computer Science And Technology

w w w. i j c s t. c o m