much smaller category by exploiting soft biometric information derived from height, gait parameters such as cadence and stride length, weight, measurements of.
Computational Approaches for Real-time Extraction of Soft Biometrics Yang Ran, IEEE Member, Gavin Rosenbush, IEEE Member, Qinfen Zheng, IEEE Member SET Corporation, Greenbelt, MD 20770, USA {yran, grosenbush, qzheng}@setcorp.com Abstract Soft biometrics, as a prescreening filter, contribute to a much smaller candidate pool and allow the overall query to perform better and faster. In this paper, we focus on the efficiency and effectiveness of several soft biometrics for surveillance applications. We propose a temporal signature in x-t slices. Such a signature has explicitly embedded body articulation and enables direct mensuration. The algorithms determine characteristics for gender, body size, height, cadence, and stride of the subject using a novel gait analysis tool. We have evaluated algorithm performance under various poses, ranges, and illuminations. Preliminary experiments have shown promising results.
1. Introduction In traditional biometrics systems, when a large number of subjects are enrolled, authentication will require significant time. A practical solution is screening to a much smaller category by exploiting soft biometric information derived from height, gait parameters such as cadence and stride length, weight, measurements of body parts, gender, eye color, hair color, age and ethnicity [1]. Extraction of soft biometrics poses several challenges such as computation efficiency and robustness to variations. In this paper, we propose several automatic and real-time soft biometrics acquisition algorithms using a novel gait signature by decomposing human silhouette images in x-t directions to estimate height/body size, stride length, and gender. The signature residing in temporal image slices not only enables direct mensuration, but also helps to remove the bias inherent in height estimation due to articulation. The benefits of our approach are: i) Improving the speed: real time extraction and robust to many factors such as distance, size, pose and viewing directions; ii) Improving the accuracy: contribute to a much smaller candidate pool and hence allow the overall query to perform better. The system flow chart is illustrated in Figure 1.
978-1-4244-2175-6/08/$25.00 ©2008 IEEE
Sensor
Detection and Tracking
x-t Gait Signature
Height /Size
Gender Stride/Step
Figure 1: Flow chart for proposed algorithms.
2. Related Work Many progresses have been achieved in extracting several primary biometrics modalities [2]. Among them, iris, fingerprints and face are the most promising and widely used in existing biometrics recognition systems. In order to reduce the verification time and allow quick screening of a particular individual for further investigation, the use of Soft-Biometrics signatures has been suggested in [1]. Soft biometrics is defined as the human characteristics that provide information about the individual but are insufficient to differentiate individuals [2]. Soft biometrics [1,3] is capable of providing coarse authentication performance at large ranges, relatively wide pose variations and uncontrolled illuminations.
3. Biometrics in Space and Time Humans walk at a stable frequency. The body and limbs maintain the center of gravity above the point of contact and minimize the muscular effort needed for balancing the whole body. As a human walks, the swinging motion of the limbs generates a signature pattern that can then be used for mensuration as shown in Figure. 2. We first align a moving individual by stacking the bounding boxes of all the frames to create an x-y-t volume in which the signature resides. A close look at the horizontal slices (x-t) discloses some body articulation parameters are naturally embedded as a ‘figure 8’ and the vertical slices (y-t) carry the body articulation along the other direction.
of segments as inspired by Figure 2. So
t0 is the time
when the gait approaches maximum oscillating amplitude (or two halves intersect with each other during double-support) and only one t0 exists in each quadrant. The segments are defined as: Figure 2: Gait Signature in Temporal Domains. In traditional methods, body articulation causes uncertainty in height or body size estimation. For instance, Figure 3 lists 5 different poses. It shows that to get an accurate classification of body size from a silhouette, it is important that the blob areas are calculated from the same pose for different subjects. Similar bias exists in height estimation. The signature proposed in this work can be used to locate the same gait poses (double support, in red box) for every sequence and hence systematically reduce the noise.
Figure 3: Pose variability effects on body size and height estimation. We use gait signature to choose the same pose (double support as in red) for calculation. The remainder of this paper is organized as follows. Starting with these temporal patterns, we first use them to estimate the stride/step lengths. Then the height and frame size are sequentially obtained. Finally we describe a method to classify the gender. The benefits to use such signatures are two-fold. First it contains information for stride/step length and gait rate estimation; second, it reduces the uncertainty due to articulation in height estimation.
4. Estimation of Stride and Cadence We use an iterative method [5] to simultaneously segment and learn the structure of the gait. A horizontal slice is divided into gait cycles by a FFT based periodicity estimator. Each cycle is divided into four quadrants corresponding to different gait stages from double-support I (when the two toes both contact the ground), left/right toe support, double support II (when the two toes both contact the ground again) to right/left toe support. The gait curve in a quadrant is modeled as:
s1 = {s(t ) t ∈ (0, t0 )} , s2 = {s(t ) t ∈ (t0 , T 2)}
This process leads to find two segments s1 and s2 such that the overall distance is minimized over all data points, which can be solved using standard spectral clustering technique. t0
T 2
0
t0
Cs ,s2 = ∫ d ( x, s1 )2 dt + ∫
d ( x, s2 )2 dt
Each time a leg goes forward, it makes a step. The time it takes for this to occur is called the cadence C, the ground distance for this to occur is called stride length L. Stride and cadence (C, L) are functions of body height, weight, and gender, and we use these as soft biometrics for human classification. We then extract the lower quarter of the gait sequence to get the human gait signature at different heights along the legs. Let P1(x1,t1) and P2(x2,t2) be the adjacent points in Figure 7 where two halves intersect in the widest one (corresponding to the lowest part of legs). The walking is being executed at T = (t1-t2)*2 frames/second (two steps per cycle). Given that the person has walked a distance W meters at T frames per cycle (stride), the cadence C (in steps per minute) and stride length L (in pixels) are obtained by C = 120 * Fs / T and L = abs ( x1 − x2 ) , respectively, where Fs is the frame rate. L can be calculated into meters from L using a simple calibration process [9]. We use a method similar to [6], together with a calibrated static camera model, to project stride length L to canonical plane.
5. Height and Body Size Estimation
where t is the latent variable uniformly distributed over a quadrant and p ( x t ) and q( x t ) are the distributions of
In surveillance applications, most moving humans are on the ground plane. Therefore, for a rigid or near rigid object, the trajectory of some point (such as head or body mass center) in different frames is parallel to the ground plane. We assume that the minimal calibration has been done using methods reported by Shao et al [6] and Criminisi et al [7]. This condition includes three parts, the vanishing line of a ground plane, the vertical point and the reference height(s). Traditional method does not consider the height variation due to cyclic gait articulation or handle with an implicit strategy. Here we propose an explicit method.
features at point t. Instead of starting with one line segment corresponding to the most dominant principal component and then increasing the number of segments gradually as in [9], we use a fixed number (2)
Given a reference height hr , two image points representing the reference object’s top and bottom, Pr ,top and Pr ,bottom , combined with the estimated vertical
t0
T 2
0
t0
p ( x ) = ∫ p( x t ) p( t )dt + ∫
q( x t )q(t )dt
ˆ and vanishing line Lˆ , we apply the following point p operations for a target only during double support pose:
Ptop and Pbottom
Step 1. Locate two image points
determined by the principal axis of body blob in a Double Support gait pose, which form a line segment representing the object height h in the image domain; Step 2. Compute the projected point the P
⊥
i
line
segment
(
Ptop
,
⊥
Pi from Pi to Pbottom
= ( pˆ × Pr ,bottom ) × ( Ptop × ( L × ( Pbottom × Pr ,bottom )))
)
by where
i = top; bottom Step 3. Compute the object height as: ⊥
hk = hr ⋅
d ( pˆ , P i )d ( Pr ,top , Pr ,bottom ) d ( P ⊥ i , Pr ,bottom ) d ( Pr ,top , pˆ )
The entire procedure is illustrated in Figure 4. We obtain a motion blob image corresponding to the moving objects. Then we estimate the end points of the line segments which represent the three objects, depicted by yellow lines. The optimized end points are depicted by red lines. Red dotted line is the vanishing line and the light blue segment is the reference height.
Figure 4: The procedure for measuring height during double support pose. Sequential Estimation from a Video The measurements from multiple frames for height may include outlier observations due to poor tracking locations, articulation variations and occlusions, making the mean of the multi-frame measurements not robust. The Least Median Square (LMedS) method used by [6] has the well-known property of being less sensitive to outliers. Specifically, it finds the solution to the following problem:
θ * = arg min mediank c −1k (hk − θ ) 2 θ
Body Frame Size Estimation from a Video As observed in Figure 3, the 2D image area occupied by a human body changes significantly with poses. To overcome it, we only estimate the body frame size from the silhouette corresponding to the gait pose ‘Double Support’. We first use the line connecting the two image points Ptop and Pbottom determined by the
principal axis of motion blob, as described in above section, to build a histogram by projecting the binary silhouette along that line. This gives us a 1D vector describing the width of the body at every height in the double support pose. After vector length normalization, magnitudes of these vectors carry size information. For example, larger frame size will result in higher signal in the width vector. We train a 3-class classifier using the SVM classifier in [9] and implemented in Intel OpenCV 1.0 (http://opencvlibrary.sourceforge.net) for 3 body size categories: S, M and L. Table 1: Gait signatures extracted at shoulder, toe and pelvis heights from male and female subjects in the USF Gait dataset. Pers on
Silho uette
At Shoulder
At Toe
At Pelvis
6. Gender Classification It has been shown that human viewers are capable of determining the gender of walkers wearing markers with 63% [10] accuracy at a side view and 79% [11] accuracy at a frontal or oblique view. [11] concluded that men swing their shoulders more while women swing their hips more and they tend to have different stride lengths and styles. Clearly, gait encapsulates information about gender. [12] used joint angles and SVMs. However, joint angles are difficult to estimate. We take a novel approach by examining the body motion at waist, shoulders, and pelvis in the temporal slices using the extracted signatures proposed in section 4. Based on biological study [13], for a person of height h, we can estimate the heights of the shoulder, waist, and pelvis (0.818×h, 0.530×h, 0.480×h respectively). The gait signatures at these heights will reveal gender differences. Preliminary examples are shown in Table 1. We illustrate the power of using temporal signature around toes (Toe swing) normalized by height in lateral view in Figure 5.
7. Experimental Results and Discussion Our experiments demonstrate the benefits of utilizing the gender, body size, and height as a pre-processing to narrow down the candidate list. The dataset includes USF outdoor dataset (http:// figment.csee.usf.edu/GaitBaseline/) and SET HD indoor dataset. The former consist 70 subjects and the latter consists of 6 subjects in 4 different poses obtained using a Canon HV10 sensor. We first
illustrate the best achieved individual biometrics performance in Table 2.
Figure 5: Temporal signature around toes (Toe swing) normalized by height provides good classification of gender in lateral view. Swing variance as feature. Table 2: Best performance achieved for individual soft biometrics with a false alarm rate around 10%. For category-type biometrics we use recognition rate and for mensuration-type biometrics we use the average absolute difference to the ground truth. *For Stride/USF entry we use pixel as ground truth unit since no calibration available. Data
Height
Step/Stride
Body size
Gender
USF
N/A
1.09 pixels*
75%/10%
78%/9%
SET
1.2cm
2.59cm
87%/10%
N/A
From our experiments, gender and height/body size has a slightly better performance over stride/step lengths. We then use a method [8] for fusing multiple soft biometric inputs. Figure 6 shows the Receiver Operating Characteristic (ROC) of each individual and the weighted soft biometrics. An improvement of 8% in the Detection Rate (DR) can be observed under the same False Acceptance Rate (FAR). In summary, we have provided several video based soft biometric extraction methods and demonstrated their effectiveness. Although these soft biometric characteristics are not permanent, they provide some information about the identity that leads to a smaller candidate pool. Our future research work will involve establishing a procedure to determine the optimal set of fusion weights based on larger datasets.
Figure 6: ROC of individual and fused soft biometrics.
References [1]
[2] [3] [4] [5] [6] [7] [8]
[9] [10] [11]
[12] [13]
A. K. Jain, S. C. Dass, and K. Nandakumar, “Soft Biometric Traits for Personal Recognition Systems”, Procds of International Conference on Biometric Authentication, pp. 731-738, Hong Kong, July 2004 NIST Report to the US Congress, “Summary of NIST Standards for Biometric Accuracy, Tamper Resistance, and Interoperability”, Nov. 13, 2002. A.K. Jain, R. Bolle, and S. Pankanti, , Biometrics: Personal Identification in Networked Society, Kluwer Academic Publishers, 1999 Y. Ran, R. Chellappa. “Finding Gait in Space and Time”, Proceedings of the 18th International Conference on Pattern Recognition - Pages: 586 - 589 T. Hastie and W. Stuetzle, Principal curves, Jl. Amer. Stat. Assn., vol. 84, pp. 502 – 516, 1989. J. Shao, K. S Zhou, R. Chellappa, “Video Mensuration Using Stationary Cameras” Submited to IEEE Transactions on PAMI, 2008 A. Criminisi. Single-View Metrology: Algorithms and Applications Sep. 2002 Zurich DAGM Symposium Z. Yin, F. Porikli and R. Collins, "Likelihood Map Fusion for Visual Object Tracking," IEEE Workshop on Application of Computer Vision (WACV), Copper Mountain, CO, Jan. 2008 V. Vapnik, Statistical Learning Theory, Wiley, 1998. Kozlowski, L. T., and Cutting, J. T.: Recognizing the Sex of a Walker from a Dynamic Point-Light Display. Perception and Psychology, 21(6) (1977) 575-580 Mather, G., and Murdoch, L.: Gender Discrimination in Biological Motion Displays based on Dynamic Cues. In Proc. of the Royal Society of London, Vol.B (1994) 273-279 J. Yoo, D. Hwang, and M. Nixon, Gender Classification in Human Gait Using Support Vector Machine, ACIVS , LNCS 3708, 2005, pp. 138-145. Dempster, W. T., and Gaughran, G. R. L.: Properties of Body Segments Based on Size and Weight. American Journal of Anatomy, 120 (1967) 33-54