Yang Ran, Qinfen Zheng, Isaac Weiss, Larry S. Davis, Wael Abd-Almageed, Liang Zhao. Center for Automation Research. University of Maryland,. College Park ...
PEDESTRIAN CLASSIFICATION FROM MOVING PLATFORMS USING CYCLIC MOTION PATTERN Yang Ran, Qinfen Zheng, Isaac Weiss, Larry S. Davis, Wael Abd-Almageed, Liang Zhao Center for Automation Research University of Maryland, College Park, MD 20742-3275, USA {rany, qinfen, weiss, lsd, wamageed, lzhao}@cfar.umd.edu ABSTRACT This paper describes an efficient pedestrian detection system for videos acquired from moving platforms. Given a detected and tracked object as a sequence of images within a bounding box, we describe the periodic signature of its motion pattern using a twin-pendulum model. Then a Principle Gait Angle is extracted in every frame providing gait phase information. By estimating the periodicity from the phase data using a digital Phase Locked Loop (dPLL), we quantify the cyclic pattern of the object, which helps us to continuously classify it as a pedestrian. Past approaches have used shape detectors applied to a single image or classifiers based on human body pixel oscillations, but ours is the first to integrate a global cyclic motion model and periodicity analysis. Novel contributions of this paper include: i) development of a compact shape representation of cyclic motion as a signature for a pedestrian, ii) estimation of gait period via a feedback loop module, and iii) implementation of a fast online pedestrian classification system which operates on videos acquired from moving platforms.
1
INTRODUCTION
1.1 Related Work In recent years, automatic pedestrian detection in video has become an active research area in computer vision. This task is especially difficult for video from moving platforms. Two challenging tasks in these applications are 1) non-rigid kinematics for pedestrian walking; 2) cluttered background from moving sensors; and 3) arbitrary camera motion. A review of some of the prior research on this topic can be found in [1]. Previously proposed methods have fallen into two major categories. One is a detection style approach. A detector scans every location in a frame or difference of adjacent frames. They measure the specific shape in appearance or motion and decide where a target is detected. The other category is a detect-then-classify style approach. Candidate objects are identified and tracked. Then the gait periodicity is analyzed through
0-7803-9134-9/05/$20.00 ©2005 IEEE
pixel wise oscillation. The overall statistical periodic behavior enables classification. Exemplary algorithms for the first category could be found in [1] with learning tools such as wavelet, neural network and so on. Nanda [4] builds a probabilistic shape hierarchy to achieve efficient detection on different scales. Viola’s [6] Adaboost detector is trained using large datasets to achieve high detection rates and very low false positive rates. This makes it hard to achieve real time [6]. In the second category, Little [2] used a Discrete Fourier Transform (DFT) based approach to measure pixel oscillations. Efros and Berg [3] identified the cyclic motion in the optical flow domain. Liu and Picard [10] examine the pixel oscillation over the XT plane to extract the fundamental frequency of the gait. Boyd [9] uses vPLLs (video Phase-locked Loops) to measure the period contained in every pixel due to a gait. The main limitations of prior approaches in the second category are the sensitivity to pixel alignment as well as changing background. For videos acquired from moving platform, accurate alignment is hard to achieve and hence pixel wise periodicity tends to be corrupted. There are also techniques that classify contours as pedestrians based on shape, but obtaining accurate object contours is a challenging task. A method closely related to this paper can be found in Cutler and Davis [5]. The authors examine the gait period by calculating a similarity matrix for every image pair. We extend their work by including a feasible pedestrianspecific model and a sequential classification module. 1.2 Brief Algorithm Overview For targets observed from moving platforms, we use a local background subtraction method to estimate the local camera motion parameter, which helps us to stabilize the background in adjacent frames and focus our cyclic motion signature measure on foreground objects. A twinpendulum model is then fit to every frame for each target, and the gait period is estimated via a digital PLL feedback network. This paper is structured as follows. Section 2 introduces the twin-pendulum gait model that motivates the cyclic motion pattern analysis. Section 3 discusses
how to extract this feature. The classification via dPLL is explained in Section 4. Experimental results with comparisons are presented in Section 5. 2
CLYCLIC WALKING PATTERN
2.1 Periodical Motion Besides the speed of motion (such as walking or running), gait also describes the style or manner in which a human moves. The former differentiates a pedestrian from other non-periodic motions such as vehicles driving or wind blowing, while the latter differentiates humans from other moving objects such as fans or dogs. By comparing different kinds of periodic motion from various objects illustrated in Figure 1, we can identify a distincitve pattern only for pedestrians. The swing of the two legs characterizes this pedestrian-specific oscillation.
(a) (b) (c) Figure 1. Cyclic Motions: (a) Fan; (b) Dog (c) Pedestrian 2.2 Motion Pattern in Human Gait We start by investigating the kinematics of human gait from a synthesized sequence as in Figure 2. The figure describes a complete cycle of a pedestrian’s legs. We develop a computationally efficient human motion analysis algorithm inspired by the twin-pendulum model introduced in the literature[1]. The twin-pendulum model has a very simple form but captures the inherent nature of gait. It focuses on the motions of the legs. Each leg is represented by two jointed cylinders. The diameters of the cylinders are constant but the length of the cylinders are changing over time.
Figure 2. Twin-pendulum Model in Human Gait 3
EXTRACTING MOTION PATTERN
The model proposed above inspires us to classify a moving object as a human by features related to the cyclic motion pattern. But the change of apperance, the non-rigid deformation of human body and the arbitrary motion of camera present challenges to the solution. Two issues are
of interest here. Firstly, this feature should be a global one rather than pixel wise. Tracking pedestrians in videos from moving platform is unreliable and pixels are hard to align. Secondly, shape extraction is very difficult . we favor a robust feature derived from the human contour instead of the contour itself. A closer look at Figure 1 and 2 reveals that the relative location between two legs could be used as a cue in our system. The Principle Gait Angle is defined as the angle between two legs during walking. We are especially interested in two special cases when the two legs are maximally and minimally separated as (a) and (g) in Figure 2 circled by red box. They correspond to two critical phases where the toe-to-toe distance approaches maximum or minimum value.
Figure 3. Principle Gait Angle in original and edge images We next describe how to extract the Principle Gait Angle from noisy images. We first apply a Canny edge operator to the target. Then a Hough line detector scans the edge map and generates a list of the lines in the image. Third, we use a cascade of classifiers to identify the two specific phases. Each positive fitting should have line segments with enough length forming the gait angle and the difference between the angle and the model should fall into a narrow range. An example is given in the following figure. We give four sample fitting results in a sequence of edge maps of the lower part of a pedestrian.
Figure 4. Illustration of twin-pendulum model fitting. White pixels: Edges; Green Pixels: detected lines by Hough Transform; Red Pixels: Fitted Lines forming Principle Gait Angle The result from the above would give us two binary sequences. For example, one sequence representing the frames (1s) with observed maximum toe-to-toe distance for the top-left object in Figure 3 is: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 111 0 0 0 0 0 0 0 1 0 0 0 0 0 11 0, (1) 0 11 0 0 0 0 0 0 0 0 0 0 0 111 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1,
1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1, 1 0 1 0 0 0 0 0 0 0 0 0 0 111 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 11....
Even with false alarms, we still can observe some periodic oscillation in this sequence and a more accurate solution will be provided in Section 4. The major advantage of this method lies in the use of the Principle Gait Angle as a feature that contains the shape kinematics information and is invariant to the change of appearance as well as camera
motion. Another advantage of using these features is that the classification becomes a simple line fitting procedure. 3.1 Invariance to Camera Motion A challenge in the model fitting process is the noisy edge map and arbitrary camera motion. When the camera is mounted on a moving vehicle, perspective configuration requires range data to restore the 3D scene structure. Instead we employ a local background subtraction method. By focusing only on a small patch surrounding the detected bounding box instead of the whole frame or the object itself, we model the motion of camera between two adjacent frames at this given location with several affine parameters. By optimizing the sum of pixel wise patch difference, we can obtain a set of stabilization parameters with such a cost function using the method reported in [14]. After stabilizing the previous frame for the current one, edges from the background are mostly removed and those in pedestrian contour are mostly kept. 4
ESTIMATION OF PERIOD
4.1 Phase Locked Loop (PLL) PLL, or Phase-Locked Loop, is basically a close-loop feedback control system, whose functioning is based on the detection of the phase difference between the input and output signals of voltage controlled oscillator (VCO). Phase-locked loops are widely used in communications. An introduction to PLL can be found in [12].
pedestrian detection has a structure similar to figure 5. It operates on the model fitting output sequence as in (1). The input data is two 0-1 sequences representing two critical phases corresponding to maximum and minimum toe-to-toe distances. We combine the two sequences into a new one by subtracting one from the other. This difference signal is passed through a low pass filter to remove high frequency components and obtain a smoothed signal: vi (t ) = LowPass ( vi ,1 (t ) − vi , 2 (t )) (2) Vi,1 Low Pass
-Vi,2
4.2 Gait Period Estimate Based on the output sequence provided by fitting the Principle Gait Angles, the method offers estimates of both the frequency of the gait and its phase. The method for
Classify
Figure 6 Diagram of the classification system Without loss of generality, we write the input signal and the output signal from VCO as (3) vi = A sin(ϖ i t + θ i ), vo = cos(ϖ o t + θ o ) If we use a multiplier as the phase detector, the signal after multiplication will be (3’) u PD = K ⋅ A sin(ϖ i t + θ i ) cos(ϖ o t + θ o ) where K is the gain of the phase detector (multiplier in our case). Furthermore, we could write (3) as u PD =
1 KA[sin((ϖ i + ϖ o )t + θ i + θ o ) + 2 sin((ϖ i − ϖ o )t + θ i − θ o )]
(4)
When ϖ o ≈ ϖ i , the first item in (4) is attenuated by the low pass filter (inside the loop filter) in figure 5. The output of the VCO after filtering can be approximated as 1 (5) u ≈ K ⋅ A sin(θ ( t ) − θ ( t )) o
Figure 5. Diagram of a digital PLL In this paper, we use a software version of PLL [13]. Fig. 5 shows the classic configuration. The phase detector is a device that compares two input frequencies, generating an output approximately proportional to their phase difference (if, for example, they differ in frequency, it gives a periodic output whose frequency is the difference frequency). Denote the reference signal frequency and the output of VCO frequency as fIN and fVCO. If fIN doesn't equal fVCO, the phase-error signal causes the VCO frequency to deviate in the direction of fIN. If conditions are right, the VCO will quickly "lock" to fIN, maintaining a fixed relationship with the input signal.
PLL
i
2
o
When the phase difference is small enough, the above equation can be simplified to 1 (6) u ≈ KA ⋅ (θ ( t ) − θ ( t )) o
2
i
o
Namely uo is proportional to θ i (t ) − θ o (t ) . We can now explain how the dPLL locks the gait period. Suppose at first that the object’s period is unknown. The initial frequency of the VCO output vo is set to a gait frequency guess ϖ o (0) (20 frames/cycle). When the gait period (frequency ϖ (t ) of vi ) changes, the phase difference between v0 and vi is detected by the phase detector which controls the vo , causing the VCO frequency to deviate in the direction of ϖ (t ) . Hence, the period is estimated continuously. Only when the rate falls into an interval representing a normal gait range will the object be classified as a pedestrian. 5
EXPERIMENTAL RESULTS
To evaluate the performance of the proposed algorithm, we implement it in a system supported by OpenCV. We use a tracking module in [11] to provide bounding boxes. We test the system over the UMD datasets. It contains 50 video clips acquired from either static cameras or cameras mounted on moving vehicles. We manually initialized the tracker and obtain the object sequences.
model-based method. The ROC curve is presented in Figure 8. We have developed a compact shape representation of cyclic motion as a signature for pedestrians, and implement a pedestrian classification system which operates on videos taken from moving platforms. Future direction of this work could be extension to activity analysis as well as modeling the motion signature in a more complicated way, containing other information such as camera view angle etc. Acknowledgement This work is supported in part by Robotics Consortium sponsored by the Collaborative Technology Alliance Program. We also acknowledge Dr. Rama Chellappa and Dr. Kevin. S. Zhou for valuable suggestions and providing code in this project. 6
Figure 7. PLL output voltage vs. locking time for Seq 12 In figure 7, we tracked two objects, a human and a randomly selected region for 200 frames. The period for the first object is locked around a frequency of 32 frames/cycle, which corresponds to the gait rate. The second is locked at around 0 since we do not observe periodical twin-pendulum model. We plot the PLL VCO voltage output vs. locking time in the second row to illustrate how fast the method adapts to the real signal. In figure 8, we present another sequence with a pedestrian walking across a street. We track her and a car for 150 frames and the PLL locks to the period after 40 frames.
Figure 8. VCO output voltage vs. locking time for Seq 36 ROC curve for pedestrian detection for color/gray sensor
1 0.9 0.8
Detection rate
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
static platform moving platform 0
0.1
0.2
0.3
0.4 0.5 0.6 False positive rate
0.7
0.8
0.9
1
Figure 8. ROC curve for UMD database In summary, a periodicity based human classification algorithm is reported here that involves a cyclic motion
REFERENCES
[1] D. M. Gavrila, The Visual Analysis of Human Movement: A Survey, Computer Vision and Image understanding, Academic Press, vol. 73, 1, pp. 82-98, 1999. [2] J.J. Little, J.E. Boyd, Recognizing people by their gait: the shape of motion, Videre: Journal of Computer Vision Research, The MIT Press, 1 (2), 1998 [3] A.A. Efros, A.C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In ICCV, pages 726--733, 2003 [4] Harsh Nanda, Larry Davis, "Probabilistic Template Based Pedestrian Detection in Infrared Videos", IEEE Intelligent Vehicle Symposium, Versailles, France, June 18-20, 2002 [5] R. Cutler, L.S. Davis, Robust real-time periodic motion detection, analysis, and applications, IEEE Trans. on Pattern Analysis and Machine Intelligence, 22 (8) (2000) 781-796 [6] Viola, P., Jones, M., Snow, D. (2003). Detecting Pedestrians using patterns of motion and appearance. ICCV03. [7] R. Polana, R. Nelson, Low level recognition of human motion. Proc. of IEEE CS Workshop on Motion of Non-Rigid and Articulated Objects. Austin, TX, 1994, pp. 77-82 [8] E. H. Adelson and J. R. Bergen, Spatiotemporal energy models for the perception of motion, J. Opt. Soc. Am. A/Vol. 2, No. 2 February 1985 [9] Boyd, J.E. Synchronization of oscillations for machine perception of gaits, CVIU(96), No. 1, Oct 2004, pp. 35-59 [10] Fang Liu, Rosalind W. Picard, Finding Periodicity in Space and Time, Proceedings of the Sixth International Conference on Computer Vision, Page: 376-382, 1998 [11] S. Zhou, R. Chellappa, and B. Moghaddam. Visual tracking and recognition using appearance-adaptive models in particle filters. IEEE Tran on Image Processing (IP), Vol. 11, pp. 14341456, Nov 2004. [12] A. Blanchard, Phase-Locked Loops. New York, NY: John Wiley & Sons, 1976. [13] W. C. Lindsey and C. M. Chie, eds., Phase-Locked Loops. IEEE PRESS, New York, NY: IEEE Press, 1986. [14] Automatic registration of oblique aerial images Qinfen Zheng; Chellappa, R.; Image Processing, 1994. Proceedings. IEEE International Conference, Volume: 1, Pages:218 - 222