3D human motion tracking based on a progressive ... - Semantic Scholar

ARTICLE IN PRESS Pattern Recognition 43 (2010) 3621–3635

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/pr

3D human motion tracking based on a progressive particle filter I-Cheng Chang n, Shih-Yao Lin Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan

a r t i c l e in fo

abstract

Article history: Received 21 November 2009 Received in revised form 15 April 2010 Accepted 1 May 2010

Human body tracking has received increasing attention in recent years due to its broad applicability. Among these tracking algorithms, the particle filter is considered an effective approach for human motion tracking. However, it suffers from the degeneracy problem and considerable computational burden. This paper presents a novel 3D model-based tracking algorithm called the progressive particle filter to decrease the computational cost in high degrees of freedom by employing hierarchical searching. In the proposed approach, likelihood measure functions involving four different features are presented to enhance the performance of model fitting. Moreover, embedded mean shift trackers are adopted to increase accuracy by moving each particle toward the location with the highest probability of posture through the estimated mean shift vector. Experimental results demonstrate that the progressive particle filter requires lower computational cost and delivers higher accuracy than the standard particle filter. & 2010 Elsevier Ltd. All rights reserved.

Keywords: Particle filter Mean shift Human motion tracking Hierarchical structure Posture recognition

1. Introduction 3D model-based body tracking has received increasing attention in recent years due to its applicability to many areas, including surveillance, virtual reality, medical analysis, biostatistics, and computer game design. To effectively detect human motion, some researchers placed motion sensors on the human body and obtained 3D motion parameters according to sensor feedback. However, this device is extremely expensive for commercial use and thus is unsuitable for most applications involving human–computer interface. Another approach is markerless human body tracking. However, obtaining sufficient information from video sequences to recover the parameters of body motion correctly is a difficult task for two reasons. The first is the large number of degrees of freedom in human body configurations, resulting in high computational loading, and the second is the self-occlusion problem between the limbs and the torso, which makes posture estimation difficult. The particle filter [1], which includes multiple predictions and lost tracks recovery, can overcome problems related to complex human motion. In addition, many schemes have been developed to reduce the computational loading in the standard particle filter. Yamamoto and Chellappa [2], Moon and Rosenfeld [3], and Navarantnam et al. [4] configured the human body as a structure of several layers and tracked the limb parameters of each layer. Deutscher et al. [5] performed human body motion tracking using

n

Corresponding author. E-mail address: [email protected] (IC. Chang).

0031-3203/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2010.05.003

the Annealed Particle Filter (APF) with three calibrated cameras. APF is designed to reduce the number of particles and is applied as an important sampling procedure that uses multi-layer searching. Kim et al. [6] and Lee et al. [7] employed a particle filter to perform both body detection and motion feature computation for estimating human posture. Saboune and Charpillet [8] presented an Interval Particle Filtering (IPF) algorithm to track human movements efficiently. To enhance accuracy, some studies integrated the training model into the particle filter. Raskin et al. [9] developed a novel system for 3D human motion tracking called the Gaussian Process Annealed Particle Filter (GPAPF). It combines APF tracking with the Gaussian Process Dynamic Model (GPDM) [10], where GPDM is a learning model for training lowdimensional data and is adopted to reduce the number of particles that need to be sampled in each state. Saboune et al. [11] combined the Dynamic Bayesian Network (DBN) with an Interval Particle Filter (IPF) to track human walking. The DBN is a temporal probabilistic model to solve the occlusion problem of human motion. Human body parts are transformed from a kinematic chain to DBN positions, and then IPF chooses some particles with higher weighting values and re-samples them in the space. Some studies employed mean shift [12,13] to track the motion of the human body. Mean shift is a non-parametric algorithm for climbing a probability density gradient to identify the highestprobability area for tracking the moving object. This approach is computationally effective and simple. However, the mean shift is based on a single hypothesis, and so it tends to converge to a local maximum, not a global maximum. Shan et al. [14] proposed a new tracking method that combines the particle filter and mean shift approach for 2D hand tracking. This method provided an effective

ARTICLE IN PRESS 3622

I.-C. Chang, S.-Y. Lin / Pattern Recognition 43 (2010) 3621–3635

sampling process by shifting predicted samples to each nearby local maximum, thus solving the degeneracy problem of the particle filter and reducing the number of samples required while still maintaining accuracy. Maggio and Cavallaro [15] presented a hybrid tracking method involving the particle filter and mean shift; the result is an adaptive transition model based on updating variances. The model efficiently employed the particle filter to deal with multiple-mode prediction and utilized the mean shift to maintain accuracy. Schmidt et al. [16] proposed a kernel particle filter for 3D articulated body tracking and focused on estimating upper body motion in real time. The kernel particle filter ([17–19]) is used to reduce the number of necessary particles by including the mean shift. The proposed progressive particle filter comprises three principal techniques, namely, hierarchical searching, multiple predictions, and iterative mode-seeking. Hierarchical searching decomposes a high-dimensional space into three subspaces of lower dimension to increase searching efficiency. Multiple sampling is used to predict the unconstrained human motion which generally follows a non-linear and non-Gaussian distribution. Furthermore, the mean shift tracker is embedded into each particle to improve result accuracy by seeking the mode of optimal posture iteratively. Experimental results indicate that our approach can successfully reduce computation time by lowering the number of sampled particles as well as improve accuracy by moving each particle to the optimal location. The rest of this paper is organized as follows: Section 2 briefly introduces the concept of the particle filter and the mean shift algorithm. Section 3 specifies the parameters of the 3D human model and describes the progressive particle filter. Likelihood measure functions are also mentioned in Section 3. Section 4 presents the experimental results. Conclusions are drawn in Section 5.

proven to work well in tracking complex motion in a cluttered background. The mean shift algorithm, which is an iterative kernel-based estimation method, has also been widely adopted to track 2D objects through an iterative density gradient to reach the highest-probability of object location. 2.1. Particle filtering Particle filter is an iterative process that estimates the posterior probability from a finite set of weighted particles. Let xt denote the state vector at time t, and let zt indicate the

Table 1 The particle filter algorithm. Initialization: t ¼ 0 Generate the particle from the prior p(x0) to obtain a weighted particle set ðiÞ fsðiÞ 0 ,w0 g For t: ¼ 1 to T Selection

Select M particles fsðiÞ t1 ,i ¼ 1,. . .,Mg according to their posterior density at time t 1 and generate a new particle set, which has a size equal to previous particle set. Prediction For i:¼ 1 to N Each particle is updated by the dynamic model of the system: ðiÞ sðiÞ t ¼ Ast1 þ Bt where Bt is a standard Gaussian random variate. End. Measurement For i:¼ 1 to N Estimate the new weights of each particle: ðiÞ ðiÞ wðiÞ t ¼ wt1 Upðzt 9xt Þ End. For j: ¼1 to N

2. Background Particle filter is a Bayesian sequential factored sampling approach, which is generally applied to solve non-linear and non-Gaussian estimation problems. The method [1] was originally applied to track objects in 2D video sequences and has been

ðjÞ Normalize the weights wðjÞ t ¼ wt =

N P j¼1

wðjÞ t

End. Output Estimate mean state x^ t at time t as N X ðiÞ wðiÞ x^ t ¼ E½xt 9Zt ¼ t st : i¼1

p ( zt −1 | xt −1 )

xt −1

Selection

W x′

Prediction

M

p ( zt | xt )

x r xt Measurement Fig. 1. Particle filter algorithm.

Fig. 2. Symbolic illustration of the mean shift algorithm.

ARTICLE IN PRESS I.-C. Chang, S.-Y. Lin / Pattern Recognition 43 (2010) 3621–3635

observation at time t. The history of observations from time 1 to t is expressed as Zt ¼ {z1, y, zt}. The algorithm has three major steps, namely selection, prediction, and measurement. Suppose that each state has N weighted particles at time t (i) expressed as {s(i) t , wt , i¼1, y, N}. (1) Selection: Generate a new particle set by choosing the particles with the highest posterior probability p(x(i) t 19zt 1) among the previous particle set at time t 1. The size of the particle set is a constant. (2) Prediction: Assume that the pdf p(xt 19Zt 1) is available at time t 1. The prediction step is: Z pðxt 9Zt1 Þ ¼ pðxt 9xt1 Þpðxt1 9Zt1 Þdxt1 ð1Þ where p(xt9xt 1) defines the state transition probability for t 40. (3) Measurement: Calculate the posterior density p(xt9zt), weight the current particle by the predicted prior probability using Eq. (1), and apply observation zt at time t to estimate the likelihood probability p(zt9xt). The posterior p(xt9Zt) can be expressed in Bayesian form as follows: pðxt 9Zt Þ ¼

pðzt 9xt Þpðxt 9Zt1 Þ pðzt 9Zt1 Þ

Z

pðzt 9xt Þpðxt 9Zt1 Þdxt

Finally, the mean state x^ t at time t can be obtained by the average of the weighted particles as x^ t ¼ E½xt 9Zt ¼

N X

wtðiÞ stðiÞ

ð5Þ

i¼1

The particle filter performs well in 2D object tracking because it provides multiple hypotheses to recover from lost tracks. However, the accuracy of a particle filter usually depends on the number of particles. If the number of particles is increased to improve accuracy, computational complexity also increases. Fig. 1 illustrates a diagram of the particle filter during one time step. Table 1 shows the detail algorithm of the particle filter. 2.2. The mean shift algorithm The mean shift algorithm [12,13] climbs a density gradient to identify nearby mode with a non-parametric isotropic kernel. This task iteratively estimates the mean shift vector M from the current mean position x, which predicts the exact mean location x0 of the sample points that are within search window W with

ð2Þ

where p(zt9Zt 1) denotes the normalizing constant in the denominator as pðzt 9Zt1 Þ ¼

3623

Optimal Solution

ð3Þ

The normalized weight wtðiÞ at time t is treated as the posterior probability: ðiÞ ðiÞ wðiÞ t ppðzt 9xt ¼ st Þ,

N X i¼1

wðiÞ t ¼1

ð4Þ

Fig. 4. Two failed examples for the particle filter: (a) insufficient number of particles and (b) a large number of particles with high computational cost, yet the particle filter still may not reach the optimal position.

Fig. 3. Mean shift procedure: (a) initialization, (b), (c) mean shift vector estimation, (d) shift xt1 to the mean position through the mean shift vector and (e), (f) move to the location of mode.



Global Maximum

Local Maximum Mean Shift Convergence

Mean Shift Vector Direction

Mean Shift Search Window

Fig. 5. The mean shift algorithm when limited to the local maximum: (a) initial position, (b) mean shift procedure and (c) mean shift result.

radius r. Fig. 2 symbolically depicts the symbols used in the mean shift algorithm. The procedure for implementing the mean shift algorithm is presented below: (1) Let there be a finite number of data points in the ddimensional space Rd, where nt denotes the number of within an region W, and each data point data points sðiÞ t is weighted according to a weighting function w( ) at time t (Fig. 3(a)). (2) Calculate the mean shift vector Mðxt1 Þ (Fig. 3(b), (c)) Pn t ðiÞ ðiÞ ðiÞ i ¼ 1 Kðst xt1 Þwðst Þst Mðxt1 Þ ¼ P ð6Þ ðiÞ ðiÞ nx i ¼ 1 Kðst xt1 Þwðst Þ where the kernel density estimate can be written as ðiÞ nx 1 X s x f^ ðxÞ ¼ K wtðiÞ r nx r d i ¼ 1

Table 2 A comparison between the particle filter and mean shift algorithm. Properties

Method

Multiple-mode prediction Global maximum Degeneracy problem Lost track recovery Higher complexity Iterative searching

Particle filter

Mean shift

Yes Yes Yes Yes Yes No

No No No No No Yes

ð7Þ

Parameters wðiÞ t are computed as follows: ðiÞ wðiÞ t pwt1

ðiÞ pðzt 9xtðiÞ ÞpðxtðiÞ 9xt1 Þ

qðxtðiÞ 9xtðiÞ ,ztðiÞ Þ

ð8Þ

The general kernel functions K( ), such as Uniform Kernel KU, Gaussian Kernel KG and Equanechnikov Kernel KE, are expressed as follows. ( 1 if :x: r1 KU ¼ ð9Þ 0 otherwise KG ¼

1 ð2pÞd=2 (

KE ¼

2

eð:x: 2

=2Þ

ð1:x: Þ

if :x: r 1

0

otherwise

ð10Þ

ð11Þ

(3) Shift xt1 to the new mean shift position x0t1 using the mean shift vector Mr ðxt1 Þ, and then set xt1 to xt (Fig. 3(d)). (4) Repeat (2) and (3) until Mðxt1 Þ o e, where e denotes the threshold indicating the suitable moving range of xt .

Fig. 6. A 3D human body model.

number of particles are employed. Fig. 4 illustrates two cases in which the particle filter does not reach the optimal position precisely. The mean shift has low computational cost. However, because the algorithm only utilizes a single hypothesis, it may converge to the local maximum rather than a global maximum. Hence, it is possible to fail in recovering the lost tracks. Fig. 5 illustrates a failed case of the mean shift tracking. Table 2 compares the particle filter and mean shift algorithm to show that the two methods have complementary benefits and drawbacks. We propose an enhanced algorithm that can effectively integrate the benefits of both methods to search for optimal posture parameters.

2.3. A comparison between particle filter and mean shift Although the particle filter uses the stochastic sampling, which performs multiple predictions to recover the false tracks, the performance of this technique normally depends on the number of particles. The particle filter may lose the target or generate an inaccurate position due to an insufficient number of particles. However, employing a large number of particles leads to a high computational cost. Moreover, it is not guaranteed that the optimal position is necessarily sampled, even when a large

3. 3D human motion tracking This section first presents the 3D human model and the corresponding motion parameters, together with the likelihood measurement function, which is used to evaluate the similarity between the human model and the human profile. The proposed approach, which is called the progressive particle filter, is then described in detail.


3625

Deformable Procedure

Initial Flesh

Deformation

Output

Fig. 7. Deformable flesh: the proposed 3D human model can be adjusted to fit the body.

Fig. 10. Global rotation estimation: set a small sphere to fit the skin color map for generating the global rotation parameter.

space (Fig. 8(a)). Local parameter XL defines the Euler angle of i i i each joint and X L ¼ fyx , yy , yz , i ¼ 1,. . .,8g.

y Position y x

3.2. Likelihood measure function

z

Rotation z Position x

x

Rotation x Rotation y Position z

y

z

Fig. 8. Human motion parameters: (a) global motion parameters for translation and rotation of the entire body and (b) local motion parameter for the joint angles for each limb.

Selecting appropriate features is a vital task in the tracking process. Four effective features are selected in our system, namely, silhouette mask OSilhouette(x, y), edge distance map OEdge(x, y), contour distance map OContour(x, y), and skin color map OSkin (Fig. 9). Based on these four extracted features, five measure functions (MSilhouette( ), MEdge( ), MContour( ), MSkin( ), and MRotation_h( )) are defined to evaluate the similarity between the input image O and the predicted human model pose V, where the predicted model posture is derived from sampled position using particle filtering. The definitions of these measure functions are as follows: (1) The silhouette measure function: P P x y :OSilhouette ðx,yÞVSilhouette ðx,yÞ: P P MSilhouette ðO,VÞ ¼ x y OSilhouette ðx,yÞ

Original Image

Silhouette

Skin Color Map

(2) The edge distance measure function: P P x y distðOEdge ðx,yÞ,VEdge ðx,yÞÞ P P MEdge ðO,VÞ ¼ x y OEdge ðx,yÞ (3) The contour distance measure function: P P x y distðOContour ðx,yÞ,VContour ðx,yÞÞ P P MContour ðO,VÞ ¼ x y OContour ðx,yÞ

Distance from Edge Map

Distance of Contour Map

Fig. 9. Feature extraction: (a) original image, (b) silhouette map, (c) skin color map, (d) contour distance map and (e) edge distance map.

3.1. 3D virtual human model A 3D virtual human model (Fig. 6) is developed to simulate human movement. The entire body model is segmented into 15 rigid parts, consisting of a head, a torso, a neck, two upper arms, two lower arms, two hands, two thighs, two legs, and two feet. Every two adjacent parts are connected via a joint. In contrast to a simple geometric human model, the proposed 3D human model is constructed from deformable flesh (Fig. 7). Deformable flesh can be deformed to precisely fit the target body to achieve accurate tracking results. The proposed human model uses 32 degrees of freedom (DOFs) to simulate all movements of the human body. The set of motion parameters, which is denoted as X, is expressed in two parts, i.e., X¼{XG, XL}, where XG and XL indicate the global and local parameters, respectively. Global parameter XG includes the position (px, py, pz) and rotation angles (cx, cy, cz) in 3D virtual

ð12Þ

ð13Þ

ð14Þ

where OSilhouette and VSilhouette denote the silhouette maps of the input image and human model, respectively. OEdge and VEdge indicate the edge maps of the input image and human model, respectively. OContour and VContour indicate the contour maps of the input image and human model, respectively. dist(.) denotes the edge distance function used to find the distance to the nearest pixel. The skin color measure function MSkin(.) involves two parts, MSkin_Face(.) and MSkin_Arm(.), and is defined as follows. (4) The skin color measure function: MSkin ðO,VÞ ¼ ðMSkin_Face ðO,VÞ,MSkin_Arm ðO,VÞÞ P P MSkin_Face ðO,VÞ ¼

x

y :OSkin_Face ðx,yÞVFace ðx,yÞ:

P P

P P MSkin_Arm ðO,VÞ ¼

ð15Þ

x

y OSkin_Face ðx,yÞ

x

y :OSkin_Arm ðx,yÞVArm ðx,yÞ:

P P x

y OSkin_Arm ðx,yÞ

ð16Þ

where MSkin_Face(.) detects the overlap region of the face of the observed human OSkin_Face(x, y) and the face of virtual human model VFace(x, y) and then determines the global position of the body. MSkin_Arm(x, y) is derived from the pose of the arm. Since different kinds of clothes may affect the



Global position

Global rotation

Local upper extremity pose

Local lower extremity pose

Tracking result

Fig. 11. Hierarchical layer searching: (a) identification of the global position, (b) estimation of the global rotation, (c) tracking of the upper extremity pose, (d) tracking of the lower extremity pose and (e) tracking result.

p ( zt | xtG )

Prediction (Global)

p ( zt | xtG )

Global Tracking Process

xG G t

p ( zt | x )

Weighting (Global) x

G

G t

p ( zt | x )

Mean Shift Tracking (Global) xG

Measurement (Global)

p ( zt | xtLU )

Prediction (LU) x

LU

Local Upper Extremity Tracking Process

Weighting (LU)

Mean Shift Tracking (LU)

p ( zt | xtLU )

x LU

Measurement (LU)

Prediction (LL)

Weighting (LL)

Local Lower Extremity Tracking Process

p ( zt | xtLL )

Mean Shift Tracking (LL) x

Measurement (LL)

Fig. 12. Progressive particle filter diagram: the task divides full body motion tracking process into three sub-processes: global tracking process, local upper extremity tracking process and local lower extremity tracking process.


3627

Table 3 Pseudo-code for the Progressive Particle Filter. Initialization: LUðiÞ LLðiÞ LUðiÞ Construct N weighted particles fxGðiÞ ,xt , oGðiÞ , oLLðiÞ g from the prior t ,xt t , ot t p(x0) For t: ¼ 1, 2, ... For l: ¼ 1 to 3 // Three subspace (1) Prediction For i:¼ 1 to N GðiÞ LUðiÞ LLðiÞ if (l ¼1) Sample via pðxGðiÞ t jxt1 ,xt1 ,xt1 Þ G

LLðiÞ jx^ t ,xLUðiÞ else if (l ¼ 2) Predict using pðxLUðiÞ t t1 ,xt1 Þ

else Draw with End (2) Weightingt For i:¼ 1 to N

G LU jx^ t , x^ t ,xLLðiÞ pðxLLðiÞ t t1 ,zt Þ

GðiÞ if (l ¼1)oGðiÞ ¼ oGðiÞ t t1 pðzt jxt Þ LUðiÞ else if (l ¼ 2) oLUðiÞ ¼ oLUðiÞ Þ t t1 pðzt jxt LLðiÞ else (l ¼2)oLLðiÞ ¼ oLLðiÞ Þ t t1 pðzt jxt End (3) Mean Shift Tracking 0

Fig. 13. An iterative searching process of human movement: (a) original image, (b) initial prediction, (c) the result from mean shift tracking and (d) the average of weighted particle results.

0

0

,xLUðiÞ ,xLLðiÞ g Generate new particles fxGðiÞ t t t For k:¼ 1,2,y (a) Re-sampling For i: ¼ 1 to N 0

if (l ¼1) Predict via pðxGðiÞ jx Gt,k1 Þ t,k 0

edge extraction result (OEdge(x, y)), OSkin_Arm(x, y) is an important factor to avoid ambiguity related to close intensity of pixels. According to observations, people often slightly swing their arms and shoulders while walking. Most previous works do not consider this swinging motion, which results in unnatural postures. In this paper, we attach a sphere onto each shoulder in the human model to fit the skin color map and then compute the global rotation parameter (Fig. 10). A global rotation measure function MRotation_h( ) is proposed to track the shoulder swing motion: MRotation_h ðO,VÞ ¼

XX :OSkin_Arm ðx,yÞVShoulder ðx,yÞ: P P x y OSkin_Arm ðx,yÞ x y

ð17Þ

0

else if (l ¼ 2) Predict via pðxLUðiÞ jxLUðiÞ ,x G Þ t,k t,k1 t,k1 else Predict via

0 0 pðxLLðiÞ jxLLðiÞ ,x LU Þ t,k t,k1 t,k1

End (b) Re-weighting For l: ¼1 to N 0

GðiÞ ¼ oGðiÞ if (l ¼1) oGðiÞ t t1 pðzt jxt,k Þ 0

LUðiÞ else if (l ¼ 2) oLUðiÞ ¼ oLUðiÞ t t1 pðzt jxt,k Þ 0

LLðiÞ ¼ oLLðiÞ else (l ¼2) oLLðiÞ t t1 pðzt jxt,k Þ

End (c) Calculate the mean shift vector (d) Shift each particle to the new mass center (e) If each particle converges to a nearby mode do break; (4) Measurement N P G GðiÞ oGðiÞ if (l ¼1)x^ t ¼ t xt k¼1

Finally, the likelihood measure function Lm(O, V) is expressed as follows: LmðO,VÞ ¼ exp½ðMSilhouette ðO,VÞ þMEdge ðO,VÞ þ MContour ðO,VÞ þ MSkin ðO,VÞÞ ð18Þ

LU

else if (l ¼ 2)x^ t ¼ LL else x^ t ¼

N P k¼1

N P k¼1

oLUðiÞ xLUðiÞ t t

oLLðiÞ xLLðiÞ t t

End End Output G

LU

LL

Obtain the state vector fx^ t , x^ t , x^ t g at time t End

3.3. The progressive particle filter The progressive particle filter addresses two major issues: 1. A reduction in the computational effort. 2. An increase in the accuracy of particle filter tracking. The proposed method employs a hierarchical layer-searching scheme to perform an effective searching in high-dimensional space (Fig. 11). It generates reconstructed postures by computing the parameters of the global motion state, local upper extremity motion state, and local lower extremity motion state. The global motion state xG depicts the body’s position and rotation (Fig. 11(a) and (b)), while the local upper extremity motion state xLU describes the joint angles of the left and right upper arms and left and right thighs (Fig. 11(c)). Finally, the local lower extremity motion state represents the joint angles of the left and right lower arms and left and right legs (Fig. 11(d)).

Fig. 12 illustrates the framework behind the progressive particle filter, where the tracking process is organized as a hierarchical structure and divided into three sub-processes:

(1) Global tracking process: The process first predicts the initial global location as well as rotation of the 3D human model and then moves each particle to the nearby maximum by using iterative mean shift searching. The trackers adjust the location and rotation of the 3D human model to the optimal value. After the determination of the global state, the local extremity process continues tracking the angles of all extremities. (2) Upper extremity tracking process: The process applies an upper extremity particle filter to derive a few possible postures for the upper extremity and



Left Lower Arm

Left Upper Arm

160

120

140 100

120

Angle (Degree)

Angle (Degree)

100

PF

80

PPF GT

60

PF+MS Error(PF) Error(PPF)

40

PF PPF

80

GT PF+MS

60

Error(PF) Error(PPF)

40

Error(PF+MS)

Error(PF+MS)

20 20 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 0

-20 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 Frame Num

Frame Num

Right Lower Arm 100

Right Upper Arm 40

50

20 0

0

PF

Angle (Degree)

Angle (Degree)

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 -20

PPF GT

-40

PF+MS Error(PF)

-60

Error(PPF)

PF

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43

PPF GT

-50

PF+MS Error(PF) Error(PPF)

-100

Error(PF+MS)

Error(PF+MS)

-80 -150 -100 -200

-120

Frame Num

Frame Num

Fig. 14. The angle distributions and error angle distributions of human upper extremities for PPF, PF, and PF_MS: (a) distributions for the left upper arm, (b) distributions for the left lower arm, (c) distributions for the right upper arm and (d) distributions for the right lower arm.

Table 4 Error comparisons for PF, PF_MS, and PPF. Number of initial particles

PF PF_MS PPF

200 200 20 4

Mean error (degree) Left upper arm

Left lower arm

Right upper arm

Right lower arm

Total

8.5 6.1 5.9

24.2 8.8 8.4

8.5 5.1 7.6

16.1 8.0 9.4

14.3 7.0 7.8

weights each particle by estimating the similarity between the predicted posture and observed features. The process then applies the upper extremity mean shift trackers to move each particle to the most similar to the observed posture. (3) Lower extremity tracking process: The process filter re-samples the particles to predict initial lower extremity poses and weights each particle according to the similarity between the sampled posture and observed lower extremity. The lower extremity mean shift trackers adjust the lower extremity of the virtual model to the posture most similar to the observed human posture. As shown in Fig. 12, the utility of the mean shift trackers is increasing the accuracy of the tracking task by shifting each particle to its nearest mode. In general, most of the particles are located

around the position of target posture because of importance re-sampling. Using a mean shift tracker, the tracking process does not require a large number of particles to maintain accuracy. The search processing of the progressive particle filter is based on a Bayesian formulation to estimate the probability of each state. The motion state vector Xt of the human body is LU LL denoted as Xt ¼{xG t , xt , xt }, and zt denotes the observation at time t, where the observation includes information on silhouette, skin color, the edge distance map, and the contour distance map. The approach focuses on the probabilistic transition p(xt9xt 1) and LU divides the observation zt into two states, namely, zG t and zt , G where zt indicates the observed global transition and rotation denotes the local observed features of the whole body, and zLU t features of the upper extremity. Unlike the standard particle filter


3629

Time (Second)

Computaional Time

PF PPF PF+MS

Frame Fig. 15. Computational time for PPF (pink curve), PF (blue), and PF_MS (yellow). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

7 6

Iteration Num

5 4 3 2 1 0 1

3

5

7

9

11

13

15

17

19

21 23 25 Frame Num

27 29

31

33

35

37

39

41

Fig. 16. The distribution of the number of iterations for the left lower arm under PPF.

transition probability, the hierarchical searching scheme takes into account information on current observations in each process. The transition density is redefined as LL G LU LL G LU pðxt 9xt1 ,zt ÞppðxGt ,xLU t ,xt 9xt1 ,xt1 ,xt1 ,zt ,zt Þ

ð19Þ

The global motion tracking process has the highest priority, since it provides the groundwork for the other limbs. The global LL G LU LL G LU prediction pðxGt ,xLU t ,xt 9xt1 ,xt1 ,xt1 ,zt ,zt Þ is described as: LL G LU LL G LU pðxGt ,xLU t ,xt 9xt1 ,xt1 ,xt1 ,zt ,zt Þ LL G LU LL LU ppðzGt 9xGt ÞpðxGt ,xLU t ,xt 9xt1 ,xt1 ,xt1 ,zt Þ

prediction step, where some particles are located at false positions due to random propagating. Fig. 13(c) illustrates that the trackers move all particles to the nearby higher-probability locations by applying the mean shift vector. Each particle converges to a local maximum. Fig. 13(d) presents the estimated posture at the end of the tracking process. After an initialization procedure constructs the hierarchical LUðiÞ LLðiÞ LUðiÞ ,xt g with weights foGðiÞ , oLLðiÞ g, four particles fxGðiÞ t ,xt t , ot t procedures are performed at each iteration:

The prediction procedure predicts the probable posture by re-

LL LU LL G G LU LL LU ppðzGt 9xGt ÞpðxGt 9xGt1 ,xLU t1 ,xt1 Þpðxt ,xt 9xt ,xt1 ,xt1 ,xt1 ,zt Þ

ð20Þ Because the lower extremity state is based on the upper extremity state, the upper extremity tracking process should LL G G LU LL LU be performed first. Thus, pðxLU is t ,xt 9xt ,xt1 ,xt1 ,xt1 ,zt Þ expressed as LL G G LU LL LU pðxLU t ,xt 9xt ,xt1 ,xt1 ,xt1 ,zt Þ LU LL LU G LU LL LL G LU LL ppðzLU t 9xt ,xt Þpðxt 9xt ,xt1 ,xt1 Þpðxt 9xt ,xt ,xt1 Þ

ð21Þ

sampling each particle according to the transition probability.

The weight procedure computes the weights value LUðiÞ oGðiÞ , and oLLðiÞ for each particle according to the liket , ot t

lihood measure function.

The mean shift-tracking procedure utilizes the trackers to

move each particle independently until all the particles have converged to a corresponding maximum. The measurement procedure estimates the mean state by averaging the posterior density of each layer. n o G LU LL is exported. Table 3 Finally, the entire state x^ t ¼ x^ t , x^ t , x^ t

shows the pseudo-code for the progressive particle filter. The searching order of the upper extremity tracking process is left upper arm, right upper arm, left thigh, and then right thigh. The searching order of the lower extremity tracking process is left lower arm, right lower arm, left leg, and then right leg. Fig. 13 shows an example of the tracking process. Fig. 13(a) is an original image, and Fig. 13(b) illustrates the results at the

4. Experimental results The proposed system was implemented on a laptop computer (Pentium 4, 2.8 GHz, 1 GB RAM) with a Panasonic NV-GS500-S



Fig. 17. The tracking results of the hand-waving: (a) original image, (b) tracking results of model from PF, (c) tracking results of model from PPF and (d) tracking results of boundaries from PPF.

camera that generated 320 240 images. Four experiments were performed to evaluate the performance of the proposed algorithm. The first experiment tracks hand-waving motion and compares the performance between three methods, namely, the Progressive Particle Filter (PPF), standard Particle Filter (PF), and Particle Filter with Mean Shift (PF_MS). Both PF and PS_MS use 200 particles in

the experiment, and the PPF uses 80 particles (i.e., 20 particles for each limb) in the initial process. Fig. 14 shows the tracking angle distributions and the error angle distributions for human upper extremities. The tracking results of PPF (pink curve) and PF_MS (cyan curve) are close to the ground truth (yellow curve), as indicated in Fig. 14. The two methods involving mean shift show higher accuracy than PF (blue curve).


3631

Fig. 18. Tracking results of walking.

Table 4 shows the mean errors corresponding to the four arm extremities. It is observed that PPF has 7.8 degrees of error on average, and PS_MS has 7.0 degrees on average. Finally, PF has 14.3 degrees on average. PPF and PS_MS have similar accuracy because of the mean shift process.

Fig. 15 compares PF, PF_MS, and PPF in terms of computational time. The computational time during the first seven frames is longer than the others, since all of these approaches must perform preprocessing and initial posture identification. Fig. 15 shows that PPF required less computational time than PF and PF_MS.



Additionally, the frame rates for PPF, PF_MS and PF are 0.19, 0.09 and 0.13 frames/s, respectively. Fig. 16 illustrates the number of mean shift-tracking iterations in each frame for the lower left arm in PPF. The iteration process stops only when the particle is moved closely enough to the optimal location. All frames, except for frames 24 and 35, require fewer than four iterations, as indicated in Fig. 16. In general, fast motion usually requires more iterations. Experimental results demonstrate that 132 particles are satisfactory in each state under PPF after about 2 iterations of 20 particles for each lower extremity and about 1.3 iterations of 20 particles for each upper extremity, while PF requires 200 particles in each state. Moreover, though PF_MS also uses 200 particles, it takes extra 3.2 iterations on average to get to the final position. Fig. 17 demonstrates the tracking results of the hand-waving sequence. Fig. 17(a) shows the original video, and Fig. 17(b) depicts the tracking results of PF, where the red circles indicate the obvious errors. The errors may be caused when (1) the predicted particles are not located at the correct position, or (2) the number of particles is not large enough to predict the complete plausible angle. Fig. 17(c) shows the tracking results under PPF. Fig. 17(d) depicts the results under PPF with the boundaries of the human model (white lines) overlaid with an actual body. It can be seen that PPF can track the human body with high accuracy. The second experiment shows the tracking results for a walking sequence (Fig. 18). In this walking sequence, the person

Fig. 20. Tracking results of walking: (a) original image, (b) silhouette map, (c) edge map, (d) edge distance map, (e) tracking result without considering edge information and (f) tracking result considering edge information.

initially walks along one direction and then turns toward another direction. Two issues regarding the experiment are discussed below:

Fig. 19. Tracking results with and without the shoulder swing: (a)–(c) original image, (d)–(f) walking result without shoulder swing and (g)–(i) walking result with shoulder swing.

(1) Humans tend to swing their shoulders when they walk, but most researches do not consider this kind of movement. To detect the swing angle, we take into account the rotation to ensure that the result fits the target body precisely. Fig. 19 compares the results without shoulder swing (Fig. 19(d)–(f)) and with shoulder swing(Fig. 19(g)–(i)). It is noted that the recovery motion is more natural if the shoulder swing is considered. (2) Because of the occlusion of legs, identification of the right and left legs is needed when a walking person is tracked. The four extracted features could be used to solve this problem. Fig. 20(a) shows the original frame, and Fig. 20(b) displays the corresponding silhouette. Fig. 20(c) shows the edge map, and Fig. 20(d) depicts the derived edge distance map. Using only the silhouette map may result in erroneous tracking (Fig. 20(e)). However, using the silhouette together with the edge distance map can correctly recover human posture (Fig. 20(f)), since the edge distance map provides information on leg edges. The third experiment evaluates performance while tracking jumping motion. This kind of motion contains the fast movement of body parts, resulting in a blur effect in the captured image. A contour distance map and edge distance map can be used to reduce the error caused by the blur effect. Fig. 21 shows the tracking results. Most of these frames are tracked well, although the legs are not precisely fitted on some frames due to irregular deformation in the pants. Chinese Gong Fu is one of the most popular exercises in the world. In the fourth experiment, the tracking sequence is a


3633

Fig. 21. Tracking results of jumping.

fighting posture that is usually seen in Chinese Gong Fu movies. Fig. 22 shows the tracking results under the proposed technique of PPF. 5. Conclusion This work proposes an effective algorithm for 3D human motion tracking by applying the three principal processes of hierarchical searching, multiple predictions and iterative mode searching. The hierarchical searching approach reduces

computational time by decomposing high-dimensional space into several lower-dimensional spaces. Integrating multiple predictions and iterative mode searching improves the accuracy of the tracking results. Furthermore, a likelihood measurement function formed of four feature functions is proposed to fit the target body. Experimental results show that the progressive particle filter shows better accuracy and lower computational time than the standard particle filter. In the future, we intend to develop a deformable 3D human model to fit the variant shape of tracking target to achieve a higher precision for



Fig. 22. Tracking results of fighting posture.

the tracking procedure. Moreover, to further speed up the search procedure, we hope to develop an efficient method that can dynamically adjust the searching range for the mean shift procedure.

Acknowledgments This work was supported by the Ministry of Economic Affairs, Taiwan, ROC, under Grant No. 98-EC-17-A-02-S1-032 and Na-


tional Science Council, Taiwan, ROC, under Grant No. NSC 982221-E-259-026.

Appendix A. Supporting information Supplementary data associated with this article can be found in the online version at doi:10.1016/j.patcog.2010.05.003. References [1] M. Isard, A. Blake, CONDENSATION—conditional density propagation for visual tracking, International Journal of Computer Vision 29 (1) (1998) 5–28. [2] T. Yamamoto, R. Chellappa, Shape and motion driven particle filtering for human body tracking, in: IEEE International Conference on Multimedia and Expo, vol. 3, 2003, pp. 61–64. [3] H. Moon, R. Chellappa, A. Rosenfeld, Tracking of human activities using shape-encoded particle propagation, in: IEEE International Conference on Image Processing, vol. 1, 2001, pp. 357–360. [4] R. Navaratnam, A. Thayananthan, P. Torr and R. Cipolla, Hierarchical partbased human body pose estimation, in: British Machine Vision Conference, vol. 1, 2005, pp. 479–488. [5] J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by annealed particle filtering, in: IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, 2000, pp. 126–133. [6] S. Kim, C.-B. Park, S.-W. Lee, Tracking 3D human body using particle filter in moving monocular camera, in: International Conference on Pattern Recognition, vol. 4, 2006, pp. 805–808. [7] M.W. Lee, I. Cohen, S.K. Jung, Particle filter with analytical inference for human body tracking, in: IEEE Workshop on Motion and Video Computing, 2002, pp. 159–165.

3635

[8] J. Saboune, F. Charpillet, Using interval particle filtering for marker less 3D human motion capture, in: IEEE International Conference on Tools with Artificial Intelligence, 2005, pp. 621–627. [9] L. Raskin, E. Rivlin, M. Rudzsky, Dimensionality reduction for articulated body tracking, in: 3DTV Conference, 2007, pp. 1–4. [10] N.D. Lawrence, Gaussian process latent variable models for visualisation of high dimensional data, in: Advances in Neural Information Processing Systems, vol. 16, 2003, pp. 329–336. [11] J. Saboune, C. Rose, F. Charpillet, Factored interval particle filtering for gait analysis, in: International Conference of the IEEE Engineering in Medicine and Biology Society, 2007, pp. 3232–3235. [12] D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (5) (2002) 603–619. [13] Y. Cheng, Mean shift, mode seeking, and clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (8) (1995) 790–799. [14] C. Shan, Y. Wei, T. Tan, F. Ojardias, Real time hand tracking by combining particle filtering and mean shift, in: IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 669–674. [15] E. Maggio, A. Cavallaro, Hybrid particle filter and mean shift tracker with adaptive transition model, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, 2005, pp. 221–224. [16] J. Schmidt, J. Fritsch, B. Kwolek, Kernel particle filter for real-time 3D body tracking in monocular color images, in: IEEE International Conference on Automatic Face and Gesture Recognition, 2006, pp. 567–572. [17] C. Cheng, R. Ansari, A. Khokhar, Multiple object tracking with kernel particle filter, in: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, 2005, pp. 566 – 573. [18] C. Cheng, R. Ansari, Kernel particle filter for visual tracking, IEEE Signal Processing Letters 12 (3) (2005) 242–245. [19] C. Cheng, R. Ansari, Kernel particle filter: iterative sampling for efficient visual tracking, in: IEEE International Conference on Image Processing, vol. 3, 2003, pp. 977–980.

I-Cheng Chang received his B.S. degree in Nuclear Engineering in 1987, and his M.S. and Ph.D. degrees in Electrical Engineering in 1991 and 1999, respectively, all from the National Tsing Hua University, Hsinchu, Taiwan. In 1999, he joined Opto-Electronics & Systems Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan, as an engineer and project leader. In the autumn of 2003, he joined the Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan, and he is currently an Assistant Professor. His research interests include image/video processing, computer vision and graphics, and multimedia system design. Dr. Chang received the Annual Best Paper Award from Journal of Information Science and Engineering in 2002, and the Research Awards from Industrial Technology Research Institute in 2002 and 2003. He is a Member of the IEEE and the IPPR of Taiwan, ROC.

Shih-Yao Lin received his B.S. degree in Computer Science and Information Engineering from I-Shou University, Taiwan, in 2006, and his M.S. degree in Computer Science and Information Engineering from National Dong-Hwa University, Taiwan, in 2008, respectively. He pursues Ph.D. in Graduate Institute of Networking and Multimedia at National Taiwan University, Taiwan since 2008. His research interests include 3D human motion tracking, digital image processing and pattern recognition. He is a member of the IEEE.