Online Multiperson Tracking and Counting with Cloud Computing. Weishan Zhang. 1. Wenshan Wang. 2. Pengcheng Duan. 1. Xin Liu. 1. Qinghua Lu. 1. 1.
2014 International Conference on Identification, Information and Knowledge in the Internet of Things
Online Multiperson Tracking and Counting with Cloud Computing Weishan Zhang1 Wenshan Wang2 Pengcheng Duan1 Xin Liu1 Qinghua Lu1 Department of Software Engineering, China University of Petroleum, Qingdao, China. 266580 2 School of Computer Science, Fudan University, Shanghai, China. 200433 {zhangws}@upc.edu.cn {wangws2014}@gmail.com {chaseandblack}@gmail.com 1
Abstract
can only track a single object. In our work, we design a multiperson tracking-by-detection framework to address this "single tracking" issue. The framework is characterized as follows. HOG with SVM is used as a pedestrian detector. Kalman filter [5] is used as a tracker. Based on empirical and experimental explorations we adopt these algorithms for detection and tracking. The experiments show a good detection rate and an acceptable tracking stability. Also, we design a Nearest Neighbor classifier for data association and realize a special counter component in the framework to count pedestrians online according to our scenes. Further more, high time performance is achieved by deploying these algorithms on a real time cloud computing platform called Storm1 . The remainder of this paper is organized as follows: In section 2 we will present an overview of the work flow of the tracking algorithms. In section 3, design and implementation will be discussed in details. Section 4 is about the evaluation of the system, aiming to give an experimental demonstration of the system’s performance. Section 5 is the related work. Conclusions on our work and an outlook to future research is given in section 6.
Intelligent video surveillance is a challenging issue due to complicated scenes. Based on empirical and experimental explorations, we propose a multi-person trackingby-detection framework to achieve pedestrian counting at run time. This framework is integrated with a stream based cloud computing paradigm to improve tracking performance. We evaluated our approach which shows improved time performance compared with those classical approaches.
1
Introduction
Data-intensive applications are facing a problem that the datasets increase at exponential rate. Intelligent surveillance based on large-scale video data is one typical case with more and more cameras being applied in public places such as communities, roads and supermarkets. The practical demand for applications such as pedestrian counting and tracking is also growing. Early work is concentrated on crowd counting where ROI (region of interest) counting method is adopted, such as [2], [3], [7], [8]. In addition to crowd counting, multiperson tracking or singleperson tracking[1], [6] is also explored. However, there are three main problems that need to be addressed. First, the detection rate needs to be considered due to the diversity of colors, lights and viewing angles in complicated scenes. Detection missing (not all persons are detected) and false positive detection (detecting a non-person object) are the main problems that often occur. Second, high performance multiperson tracking for a long term should be addressed due to scene changes. Finally, performance of tracking algorithms is the problem that we focus on here. The emergence of big data and cloud computing techniques provides us an alternative to process data-intensive applications as people counting and counting. In this paper we take inspiration from the TLD (Tracking-Learning-Detection) [4] method. However TLD 978-1-4799-8003-1/14 $31.00 © 2014 IEEE DOI 10.1109/IIKI.2014.22
2
An overview of the proposed approach
In order to make use of the powerful processing capabilities from cloud computing to have a fast and robust approach, we choose in-memory cloud platform that can do real time computing. In our case, Apache Storm[2] is used. An online multiperson tracking-by-detection framework is proposed in order to accurately count pedestrians from video data based on Storm.
2.1
Overview of the Work Flow
We design the special counter component which aims to display the number of pedestrians and their tags online. The 1 https://storm.apache.org/ 2 http://storm.incubator.apache.org
72
The algorithms is deployed on a Storm cluster in order to accelerate the calculation for each frame. Furthermore, experiments suggest that background subtraction be performed before putting each frame into the Storm cluster. It is important to do so because real time must be achieved when the Storm cluster faces an influx of massive video data.
work flow of the tracking framework is shown in Figure 1. The framework is composed of three components-Detector, Tracker and Data association. For each frame, the detector looks for the objects that are observed and learned in the past. The tracker identifies an object by estimating the interframe shift of the same object. Which detector should guide the associated tracker is decided through experiments, as shown in Figure 2.
2.3
Tracker
The Kalman filter [5] is used to track pedestrians which are observed in the past. The Kalman filter was proposed by R.E.Kalman in 1960. And it is widely used in engineering. The Kalman filter [3] is based on linear dynamic systems discretized in the domain and modelled on a Markov chain which is built on linear operators perturbed by errors that may include Gaussian noise. There are two important assumptions: (1) linear dynamic systems; (2) Gaussian noise. Let the system state at time k is Xk , so it can be expressed by the system state Xk−1 . Xk = Fk Xk−1 + Buk + wk
(1)
where Fk is the state transition model. Bk is the controlinput model. wk is the process noise. uk is the control vector.
Figure 1. the work flow of people tracking and counting
zk = Hk Xk + vk
(2)
where Hk is the observation model and vk is the observation noise. In order to use the Kalman filter to predict and track pedestrians, we give one Kalman filter to each target pedestrian. And implementation details are as follows. ⎛
1 ⎜0 F =⎜ ⎝0 0
Figure 2. Tracking and counting
2.2
0 1 0 0
⎛ 1 ⎜0 H=⎜ ⎝0 0
Detector
In the framework, HOG (Histogram of Oriented Gradient) is used as the feature for pedestrian detection. It is characterized by the gradient direction to form the computational and statistical local area of the image histogram. Feeding this feature into SVM Classifier has been widely used in image recognition since 2005, especially for pedestrian detection. However, operations such as extracting large rows of HOG feature vectors, training SVMs from a large image database are too time-consuming to achieve online tracking.
2.4
1 0 1 0
⎞ 0 1⎟ ⎟ 0⎠ 1
⎞ 0 1⎟ ⎟ 0⎠ 0
Data Association
Based on empirical and experimental explorations, we choose each pedestrian’s overlapping area as the main matching feature. It can achieve good results because a pedestrian’s motion is limited inter-frames. The area can 3 http://en.wikipedia.org/wiki/Kalman_filter
73
be calculated according to the following formula. ⎧ D(x) = |X1 − X2 | ⎪ ⎨ D(y) = |Y1 − Y2 | ⎪ ⎩ S = [W − D(x)] ∗ [H − D(y)]
4
In order to evaluate the time performance, the framework is deployed on a Storm cluster and on a standalone node respectively. The Storm cluster is used to compare the performance with the standalone node. The deployment for the Storm cluster is shown in figure 3 where one node is the master and the other two are workers. Table 1 shows the environment parameters of standalone version. Table 2 shows environment parameters of cluster version.
(3)
Here X1 , X2 , Y1 , Y2 are the horizontal and vertical coordinates of the upper left corner of the rectangle between two consecutive frames. S is the area, W is the rectangle’s width and H is the rectangle’s height. Each output rectangle of the detector matches each of the rectangles of tracking pedestrians by calculating their overlapping areas (3) and we choose the biggest one. If the maximum overlapping area exists, the target is still within the scene and will be updated based on characteristic values. If not, the detected result is new and the one will be saved.
2.5
Evaluations
Counter
We design this component to count the number of pedestrians who are captured by the cameras. We establish the pedestrian’s motion model according to the difference between their rectangles’ center points and border points (in the X-axis) . The model is given below. ⎧ x = X1 − X 2 ⎪ ⎨ (4) +1, x < 0 ⎪ ⎩ D(x) = −1, x > 0
Figure 3. The deployment of evaluation environment
Table 1. single node configurations
Here X1 , X2 are the pedestrian rectangle center point abscissa and border abscissa. D(x) is the object motion direction. So we are able to judge if one person is in or out according to the two parameters. The tag for one person can be displayed online.
Items
Parameter
OS CPU RAM SCSI
Ubuntu Linux version 2.6 Intel core i5 4.00GB 500.00GB
Table 2. Storm cluster configurations
3 Design and Implementation 3.1
OS CPU RAM SCSI
Deployment on Storm
Apache Storm is an open-source distributed real time cloud computing platform. Storm makes it easy to reliably process unbounded streams of data. Java is supported in Storm and the algorithms in the proposed framework is implemented in C++ by virtue of OpenCV. OpenCV is a open source machine vision library. When running topology, we call the native code through JNI[4] . RabbitMQ is a message queue that buffers realtime video frames. Spouts receive video frames from RabbitMQ and send them into the rest of the topology. we set two types of "Bolt" to run algorithms respectively for pedestrian tracking and pedestrian counting. And we evaluate the time performance of each frame.
4.1
Master
Worker1
Worker2
Ubuntu Linux V2.6 Intel core i5 4.00GB 500.00GB
Ubuntu Linux V2.6 Intel core i5 4.00GB 500.00GB
Ubuntu Linux V2.6 Intel core i5 4.00GB 500.00GB
performance
The performance is evaluated by the computation time for each video frame. We calculate the processing time of consecutive frames that are counted from 1 up to 100. The time performance between standalone version and cluster version is illustrated in Table 3. We can draw the conclusion from the Table 3 that time performance is considerably improved when the framework is deployed on Storm. In other words, we can say that computational efficiency is greatly improved when the serial computation is transformed into parallel computing.
4 http://en.wikipedia.org/wiki/JNI
74
Table 3. Consumption time between single node and cluster nodes
5
Frame Number
Single Node (ms)
Storm Cluster (ms)
1 2 3 4 5 6 7 8 9 10 ... 96 97 98 99 100 Average
465 534 570 675 593 562 521 577 693 579 ... 572 572 560 560 535 555.26
196 206 190 189 154 154 152 148 149 151 ... 153 161 146 150 158 167.22
robust and adopt the greedy algorithm in the data association component. Second, we also plan to make the algorithms more parallelized to achieve higher performance.
Acknowledgements The research in this paper is jointly supported by the National Natural Science Foundation of China (Grant No. 61309024 and 61402533), and “Key Technologies Development Plan of Qingdao Technical Economic Development Area".
References [1] Michael D Breitenstein, Fabian Reichlin, Bastian Leibe, Esther Koller-Meier, and Luc Van Gool. Online multiperson tracking-by-detection from a single, uncalibrated camera. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(9):1820–1833, 2011.
Related work
Previous work is mainly concentrated on crowd counting, yet we adopt the tracking-by-detection framework in this paper. In contrast to crowd counting, tracking-bydetection algorithm framework has to handle the relationship between detectors and trackers. Luckily, Z.Kalal [4] proposes a novel tracking framework whose idea is similar with tracking-by-detection. He proposes a novel tracking algorithm framework which has three componentsDetector, Tracker and Learning. His method can achieve a high tracking performance when tracking a single object for a long time. And Bastian Leibe [6] considers how to couple object detection and spacetime trajectory estimation to improve robustness when tracking multi-objects.
6
[2] Antoni B Chan and Nuno Vasconcelos. Bayesian poisson regression for crowd counting. In Computer Vision, 2009 IEEE 12th International Conference on, pages 545–551. IEEE, 2009. [3] Antoni B Chan and Nuno Vasconcelos. Counting people with low-level features and bayesian regression. Image Processing, IEEE Transactions on, 21(4):2160– 2177, 2012. [4] Zdenek Kalal, Krystian Mikolajczyk, and Jiri Matas. Tracking-learning-detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(7):1409– 1422, 2012. [5] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems. Journal of Fluids Engineering, 82(1):35–45, 1960.
Conclusions and future work
Intelligent video surveillance is an important and challenging topic. In this paper, we design a novel multiperson tracking algorithm online framework in order to count pedestrians based on a in-memory Cloud computing platform. It can detect pedestrians with the help of the HOG with SVM and track them using Kalman filters and data association. These kinds of algorithms are computation-intensive that require powerful processing capabilities which motivates us to use cloud computing. Our evaluations shows that the proposed approach performs well in some real scenes and it can improve computational efficiency obviously. In the future, our work will be focused on the following two aspects: first, we will continue to train SVM classifier of HOG to achieve high accuracy of detection. The Kalman filter will be replaced by the particle filter to make tracking
[6] Bastian Leibe, Konrad Schindler, and Luc Van Gool. Coupled detection and trajectory estimation for multiobject tracking. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. IEEE, 2007. [7] Zheng Ma and Antoni B Chan. Crossing the line: Crowd counting by integer programming with local features. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2539–2546. IEEE, 2013. [8] David Ryan, Simon Denman, Clinton Fookes, and Sridha Sridharan. Crowd counting using multiple local features. In Digital Image Computing: Techniques and Applications, 2009. DICTA’09., pages 81–88. IEEE, 2009.
75