detection, tracking, counting and risk warning in a video .... This is implemented by calling ... (e) Move the center of the search window to the mass center. 1 y .
A Pedestrian Detection and Tracking System Based on Video Processing Technology Yuanyuan Chen1,2, Shuqin Guo2, Biaobiao Zhang1, K.-L. Du1 1. Enjoyor Labs, Enjoyor Inc. Hangzhou, China
Abstract—Pedestrian detection and tracking are widely applied to intelligent video surveillance, intelligent transportation, automotive autonomous driving or driving-assistance systems. We select OpenCV as the development tool for implementation of pedestrian detection, tracking, counting and risk warning in a video segment. We introduce a low-dimensional soft- output SVM pedestrian classifier to implement precise pedestrian detection. Experiments indicate that the system has high recognition accuracy, and can operate in real time. Keywords-pedestrian detection; pedestrian tracking; pedestrian counting; risk warning
I.
INTRODUCTION
Pedestrian detection and tracking are basic functions in intelligent video surveillance, intelligent transportation [1]. Their performance has a great impact on pedestrian counting, capture of pedestrian running red lights, and another behavior analysis. Specifically, it automatically detects targets of interest in an image sequence, and continuously locates the target in a subsequent sequence. Currently, the technology is widely used in banks, military spots, transportation, supermarkets, warehouses, and other locations with high security requirements. II.
SYSTEM COMPOSITION
The system is developed on the Visual Studio 2010 platform. We select OpenCV as the development tool, and improve several functions for implementing pedestrian detection, tracking, counting and risk warning in a video segment. Fig. 1 gives the system block diagram comprising five modules. The user interface is used to load and play a video, and to display the system functions. The foreground objects are obtained by difference with the background. Pedestrians are screened from the foreground objects by using pedestrian physical characteristics as well as a lowdimensional soft-output SVM pedestrian classifier. Finally, interested pedestrians are tracked and their trajectories are plotted.
Figure 1. System block diagram.
2. College of Information Engineering Zhejiang University of Technology Hangzhou, China
III.
WORKING PRINCIPLE OF THE SYSTEM
A. Pedestrian detection Pedestrian detection is to detect pedestrians in each frame of a video and to sequentially store them into a container. For video processing of a fixed camera, pedestrian detection in general can be approached by: optical flow method, interframe difference method and background subtraction method. Background subtraction method [2] is simple and easy to be implemented, which is used in this paper. Fig. 2 gives a block diagram of the pedestrian detection algorithm, comprising two modules, namely a main thread module and a support thread module. The features of the two threads are given in Fig. 2. 1) Extracting foreground objects Background difference method performs background modeling. Gaussian mixture model [3] is one of the most successful background modeling methods. It represents characteristic of each pixel in the image using the Gaussian probability density function. Let Gt be a background image at time t . Create a Gaussian mixture model for each pixel in the background image at time t : P(Gt ) = ∑ km =1 ω m,t f (G t , μ m,t , σ m2 ,t ) , (1)
where ω m,t , μ m,t , σ m2 ,t are, respectively, the weighting coefficient, the mean and variance. f (Gt , μ m,t , σ m2 ,t ) is the distribution function of the Gaussian component at time t . As time changes, the background image will slowly change. Then we have to constantly update the Gaussian mixture model: ω m,t +1 = (1 + α )ω m,t , μ m,t +1 = βμ m,t + (1 − β ) I ( x, y, t ) , (2) where α is the weight update parameter of the background model, β is the mean update parameter of the background model, and I ( x, y, t ) is the gray value of the image I at pixel ( x, y ) . For the Gaussian mixture model given by (1), P (G t ) has two parameters: mean μ and variance σ . For an image I ( x, y, t ) , match each pixel I ( x, y ) to the corresponding background model P (G t ) , if it satisfies
e
−
( I ( x, y ) − μ ( x , y )) 2 2σ 2 ( x, y )
>T ,
(3)
Figure 2. The block diagram of pedestrian detection.
where T is a preset threshold, 0.7 ≤ T ≤ 0.75 . If (3) is satisfied, ( x, y ) is judged as a background point; Otherwise, it is judged a foreground point. 2) Screening foreground objects Screening foreground objects is to identify pedestrians from foreground objects. The traditional method generally selects most obvious shape features, such as aspect ratio and size, for detection. This method can quickly remove objects with a large variation from pedestrian shape, such as vehicles. However, this shape-based method has a low accuracy. Thus, we introduce a classifier to improve the detection accuracy. Let the size of a bounding rectangle be S = W × H . Due to the varying distance between an object and a camera, an object has a size with wide variation. Thus, we divide the video frame into three regions on a horizontal or vertical axis: [0, x 0 ) , [ x 0 , x1 ) , [ x1 , x 2 ) . Judge the condition
⎧0.001 < δ S < 0.01, if 0 ≤ x 0 ⎪ (4) ⎨0.002 < δ S < 0.05, ifx 0 ≤ x1 . ⎪0.004 < δ S < 0.1, ifx ≤ x 1 2 ⎩ If (4) is satisfied, we remove too large or too small objects. Further, if the aspect ratio of the object bounding rectangle satisfies width 0.2 < scale = < 0 .8 , (5) height we determine the object as a pedestrian. As pedestrians tend to be tall and narrow, they can be well separated with the ratio. We further use classification to implement pedestrian detection. We treat each object as a separate region and perform normalization and feature extraction. Then, classify the processed objects one by one. For an arbitrary static image, these objects usually occupy small areas in each frame. Thus, we extract a characteristic dimension that is greatly reduced in each image, learning to a substantially
decreased complexity for classification. We present a lowdimensional soft-output SVM pedestrian classifier. We first extract HOG features, and then select support vector machine (SVM) with Gaussian kernel as classifier. The method is implemented as follows. (a) Select positive and negative samples for training the classifier. We select 800 positive samples and 500 negative samples in the INRIA pedestrian database, which are already normalized to 64 × 128 [4]. (b) Reduce feature dimension of the sample set. If a sample is positive, trim the pedestrian area and normalize it to 32× 64 ; otherwise, directly normalize it to 32× 64 . (c) For each sample, extract HOG features [5]. It first divides a sample into blocks of 32 × 64 pixels; then, each block is divided into 8× 8 pixel units, and the step is 8 pixels. The original HOG feature dimension is 7 × 15 × 36 = 3780 . After reducing feature dimension, HOG feature dimension is 3 × 7 × 36 = 756 . In summary, the computational complexity is reduced by a factor of 5. (d) We chose the efficient LIBSVM classifier, and use the soft output [0,1] instead of {− 1,1} [6]. (e) Train LIBSVM classifier with HOG features as input to get a low-dimensional soft-output pedestrian classifier. Meanwhile, sequentially save in the queue those foreground objects and their frame numbers of the images that objects stay without clear classification. (f) Extract HOG features of 756 dimensions for each object. Input HOG features of each object to the trained pedestrian classifier, and the output gives whether the object is a pedestrian. 3) Correcting the wrong output of the main thread In the main thread module, the foreground objects without clear classification have been saved sequentially in the queue. We process these objects in the support thread module as follows. First, when the count of foreground images stored in the queue reaches 10, the support thread starts to work. The system reads these images in FIFO order and normalizes them to size 64× 128 . Then, extract HOG features of 3780 dimensions for each image, and feed them to the trained pedestrian classifier. Compare this result with that of the low-dimensional soft output SVM classifier. If they are the same, go to the next foreground object in the queue; otherwise, we use this result instead of the previous result. B. Pedestrian tracking Pedestrian tracking is to automatically monitor the space and time changes of each pedestrian in a video sequence [7]. We select CamShift algorithm, which implements tracking by color characteristic and can effectively solve the problem of object deformation. In OpenCV4.0, there is a semi-automatic, single-object CamShift algorithm. In practical applications, we expect to track one or several objects, obtain their trajectories, and conduct analysis on the trajectories. Thus, the system implements tracking of multiple objects based on the semiautomatic, single-target CamShift algorithm [8]. The object trajectories are plotted and stored in a specified folder. The
working principle of multi-object tracking is given as follows. (a) Use the mouse to select a region of interest. Set tab trackObject[i ] to indicate whether i-th object is selected. If unselected, the value is 0; if selected, the value is -1; if end of the trace, the value is 1. Store all selected areas in the array by their labels. (b) Get the initial objects and the H component histogram. Call setMouseCallback()function to obtain the coordinates of objects of interest. Call calcHist() function to calculate the H component histogram of object region. Initialize size and position of the search window, and define the mass coordinates y 0 . (c) Use the histogram to calculate the inverse projection of the input image. This is implemented by calling calcBackProject(). (d) Run MeanShift tracking algorithm and search new window area of object image. Define the zero-order moment M 00 and first-order moments M 10 , M 01 of the search window: M 00 = ∑ x ∑ y I ( x, y) , M 10 = ∑ x ∑ y xI ( x, y) ,
M 10 = ∑ x ∑ y yI ( x, y) ,
M 4
M 2
3M 4
M
Figure 3. Illustration of pedestrian counting
(6) Figure 4. Main interface
where ( x, y ) represents the pixel in the search window, I ( x, y ) represents the grayscale value of pixel ( x, y ) in the projection image. The mass coordinates of the search window are obtained by M M (7) ( x, y ) = ( 10 , 01 ) . M 00 M 00 (e) Move the center of the search window to the mass center y 1 . Denote original mass center by y 0 . We have y 0 ← y 1 . Let d = y 1 − y 0 , ε the error threshold value, N the maximum number of iterations. If d < ε or k > N , then end the iteration and return the new target position y 1 ; otherwise k = k + 1 , go to (4). (f) Save the mass center obtained in each frame to a container of vector . The mass centers of every fifth frame are connected into a trace. Thus, we obtain the object trajectories.
C. Pedestrian counting and risk warning Pedestrian counting is an application of pedestrian detection and tracking [9]. We select a video of size M × N . Each frame is divided into four regions along the x-axis, as shown in Fig. 3. The left and right regions are for direction flagging, and the two in the middle are counting areas. When detecting a pedestrian, we first determine the entering direction. If a pedestrian enters from the left, the left identifier is set to 1, and the right identifier is 0. Then, when the pedestrian enters the left counting region, the added by1. If entering from the right, the result is similar. Risk warning is another application of pedestrian detection and tracking. For video surveillance in such places as parks, shopping malls and other scenes, some regions are prohibited for pedestrians. The system realizes automatic detection of such abnormal behavior and issues alarm. The
Figure 5. The secondary interface
method is as follows. First, set the forbidden region such as the green areas in the park, the cashier in the shopping mall. Then, determine whether a pedestrian is within the preset region. If yes, call alarm function to execute alarming. Finally, set the effect of alarming. IV.
SOFTWARE IMPLEMENTATION OF THE SYSTEM
The program runs on a PC machine with a configuration of Intel i3 (CPU), 2GB DDR3 (Memory), and Visual Studio 2010 (Development Tool). A. User interface User interface provides an interactive communication platform for user and computer. It allows us change system parameters and display system functions. We create a dialog box interface, as shown in Fig. 4. The interface consists of a
video loading area and a video processing area. The video loading area selects to play and pause a video source. The video processing area contains two options: Set and Run / Stop. Clicking Set button pops up a “Detection” dialog box, as shown in Fig. 5, where one can select the processing methods for moving target detection, pedestrian detection and tracking. Clicking Run/Stop buttons displays or stops the processing results in the display. In addition, the display text Warning, when an exception is triggered in a preset area of the display, will automatically alarm. B. Software implementation Many image processing functions in OpenCV library are used for the system development. First, load and close a video. A video is composed of a continuous sequence of images. Each image can read and displayed by setting an appropriate frame number. DrawToHDC() can be called to realize this function. A video can be closed by setting a label. Then, run the pedestrian detection and tracking module. • Background modeling. Call morphologyEx() function to implement Gaussian mixture model. • Foreground extraction. Call absdiff(), threshold() to implement differential operation and binarization, and a binary foreground image is obtained. Then, through the erode(), morphological filtering functions morphologyEx(), dilate(), one can optimize the foreground image obtained. • Edge detection. Get each foreground contours with findContours(). • Pedestrian recognition. a) Calculate the size and aspect of foreground objects. b) Load the trained SVM classifier. We can call resize() to normalize an input foreground image to 32 × 64 . c) Mark the identified pedestrian with the rectangle(). • Pedestrian tracking. Select objects of interest, and call setMouseCallback() to get the coordinates of these objects. Then, call calcHist(), normalize(), calcBackProject() to get the object histogram and reverse projection. Finally, implement pedestrian tracking with CamShift() function. Finally, run pedestrian counting and risk warning modules. The procedure of pedestrian counting is given in Section II.C .The procedure for risk warning is given below. • Add three message response functions to get the coordinates of the forbidden region. OnLButtonDown(): left-click anywhere on the display to get the coordinates for a vertex of the rectangle. OnMouseMove(): drag the mouse to a new location. OnLButtonUp(): release the mouse to obtain the diagonal vertex on the rectangle. The two vertices construct a rectangle parallel to the window.
• •
•
Change a rectangle: double-click anywhere in bounding rectangle to cancel the rectangle. Then we can redraw a rectangle. Add Warning button, initialize it to hide, and set the alarm mode. Call ShowWindow() to display the button. Call SetDownColor(), SetUpColor() to change the button color. Call ShowWindow() again to hide the button. Determine whether a pedestrian is within the preset region. If yes, call alarm() function to execute alarming.
C. Experimental results In order to verify the effectiveness of the system, we selected a surveillance video sequence. Size is 480×360, frame rate is 25f/s. The procedure of pedestrian detection is shown in Fig. 6. In Fig. 6a, we can see that the system can completely detect moving objects (vehicle and pedestrian), and some noise occurs due to antenna jitter. In Fig. 6b, the system can eliminate the noise, and identify all pedestrians. Fig. 7 shows that the system can track the objects of interest. Fig. 8 illustrates pedestrian counting. In Fig. 8a, "in: 2 out: 0" indicates a total of 2 pedestrian entering from the left, and a total of 0 entering from the right. In Fig. 8b, "in: 2 out: 1" indicates 2 entering from the left, and 1 from the right. Clearly, the system gives accurate counting from both sides. Risk warning is illustrated in Fig. 9. When someone is trampling the lawn, the system automatically alarms. First, we manually set the region of interest on the lawn. If no one The 165th frame
The 210th frame
(a) Foregrounds Extracted by background subtraction method The 165th frame
The 210th frame
(b) Pedestrians detectioned Figure 6. Pedestrian detection
(a) Single pedestrian tracking
(b) Multiple pedestrians tracking
Figure 7. Pedestrian tracking
(a) Set a warning zone
(a) Entering from the left
(b) Entering from the right
Figure 8. Pedestrian counting
stepped into the region, Warning button would is hidden. When someone steps into the preset region, the display text “Warning” flashes, and the system alarms. At the same time, save the pedestrian trajectory and record a video clip. V. SUMMARY In this paper, we developed a pedestrian detection and tracking system, and also applied it for counting and risk warning. We also designed a low-dimensional soft-output SVM pedestrian classifier. The combination of the classifier with the support thread can improve pedestrian detection accuracy and the speed of operating. REFERENCES [1]
[2]
[3]
[4]
P. Spagnolo, M. Leo, T. D’Orazio & A. Distante, “Robust moving objects segmentation by background subtraction,” Proc. Interactive Services (WIAMIS), Lisboa, Portugal, 2004, pp. 81-84. J. Rymal, J. Renno, D. Greenhill, J. Orwell & G. A. Jones, “Adaptive eigen-backgrounds for object detection,” Proc. IEEE Int. Conf. on Image Processing (ICIP), Singapore, Oct. 2004, pp. 1847-1850. M. Andriluka, S. Roth & B. Schiele, “Pictorial structures revisited: people detection and articulated pose estimation,” Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, Florida, USA , 2009, pp. 1014-1021. 9 N. Shou, H. Peng, H. Wang, L.-M. Meng, K.-L. Du, “An ROIs based pedestrian detection system for single images,” Proc. 5th. Int.
(b) Warning Figure 9. Risk warning
[5]
[6]
[7]
[8]
[9]
Congress on Image and Signal Processing (CISP), Chongqing, China, Oct. 2012, pp. 1205-1208. Q. Zhu, S. Avidan, M.-C. Yeh & K.-T. Cheng, “Fast human detection using a cascade of histograms of oriented gradients,” Proc. IEEE Conf. Computer Vision Pattern Recogn., New York, 2006, pp. 14901499. Platt J. C. “Probabilistic Output for Support Vector Machine and Comparisons to Regularized Likelihood Methods,” Advances in Large Margin Classifiers: MIT Press, 1999. C. Wen, A. Azarbayejani, T. Darrell & A. P.Pfinder, “Real-time tracking of the human body,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, pp. 780-785, 1997. M. Andriluka, S. Roth & B. Schiele, “People-tracking-by-detection and people-detection-by -tracking,” Proc. IEEE Conf. Proc. Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008, pp. 1-8. Y.-L. Hou & G.K.H. Pang, “People counting and human detection in a challenging situation,” IEEE Trans. Syst. Man Cybern., vol. 41, pp. 24-33, 2011.