Document not found! Please try again

Human activity recognition with action primitives ...

2 downloads 0 Views 570KB Size Report
clusters with Expectation Maximisation initialised by a K- means1. ... ent subjects, and are divided into train, validation and test partitions. Between 29519 (D ...
Human activity recognition with action primitives Zsolt L. Husz

Andrew M. Wallace

Patrick R. Green

Joint Research Institute in Signal and Image Processing School of Engineering and Physical Sciences School of Life Sciences Heriot-Watt University Edinburgh, UK

Abstract This paper considers the link between tracking algorithms and high-level human behavioural analysis, introducing the action primitives model that recovers symbolic labels from tracked limb configurations. The model consists of similar short-term actions, action primitives clusters, formed automatically and then labelled by supervised learning. The model allows both short actions and longer activities, either periodic or aperiodic. New labels are added incrementally. We determine the effects of model parameters on the labelling of action primitives using ground truth derived from a motion capture system. We also present a representative example of a labelled video sequence.

1. Introduction The analysis of human behaviour for surveillance, entertainment, intelligent domiciles and medical applications is of increasing interest. Video information is a valuable and easily acquired input for these domains. Previous research ranges from 2D tracking of human figures reduced to blobs ([1, 2]) or articulated human models ([3, 4]) through to 3D tracking of articulated human models ([5, 6, 7, 8, 9]). Although the results are promising, they usually stop at tracking, omitting higher-level reasoning or assigning only a single label; they are usually limited to periodic activities. Other previous work [10, 11] tackles the extraction of poses and motions relevant to a human observer, mainly for video segmentation and computer animation. Starting from the idea of important poses and actions, our paper presents a technique that first learns important actions (pose sequences) and then learns their semantic labels. In contrast to other work ([3, 12]) these sequences are not required to be periodic. These are not just global activities, but include both whole body (e.g., running, bending, standing) and local (e.g., right arm forward, left arm reaching) labels. For an overview of human action recognition and tracking [13] and [14] can be consulted.

978-1-4244-1696-7/07/$25.00 ©2007 IEEE.

2. The Action Primitives Model In this paper an action denotes a short sequence of body configurations (e.g., leg rising, arm still). It is usually, but not exclusively, defined by one or a few body parts. An activity denotes a sequence of body configurations over a longer time span. Activities can be assembled from one or more actions, and actions can specify details of an activity (e.g., walking with raised left arm). The related articulated human model consists of 12 body parts, each modelled by a truncated frustum with an elliptical subsection. The body parts have constant lengths and the body configuration is completely defined by 24 parameters, formed by the body root position, body orientation, spine tilt and azimuth, and 17 joint angles. A body feature vector (BFV) is a (sub)set of the parameters describing a body configuration, having as elements either direct features (joint angle, body position, orientation) or features derived by simple operators from direct features (velocity, global position vector). For clarity, in this paper we use a BFV with direct features, the 17 joint angles, but we have used a wider variety of features (e.g., position) where this is necessary to define an action. An action primitive (AP), a basic action, is part of the larger time scale activity, and is a set of consecutive BFVs capturing both the static and the dynamic configuration of the body. The developed action primitives model (APM) consists of AP clusters (APC), a group of similar APs. Each APC has a set of labels to characterise the APs from the cluster. The model is based on the idea that activities can be partly or completely constructed from smaller components (action primitives).

2.1. Action primitives clusters The AP clusters are learnt from a training dataset by unsupervised learning (Figure 1). Sequences of input BFVs, generated by a motion capture system, result in APs (AP ′ ) of length D + 1 (in the Sequence generation phase). To allow equal importance to all parameters, AP ′ is normalised

(Normalisation) by scaling the nominal range of parameters, defined by physical constraints on joint angles, between 0 and 1. Both the parameters of a BFV at a given time and the parameters of consecutive BFVs (i.e., AP) are highly correlated, therefore principal component analysis (PCA) is applied to reduce the dimensionality of APs. The resulting compressed P CAAP ∗ are clustered into Cclusters with Expectation Maximisation initialised by a Kmeans1 . All subsequences AP of AP ′ , omitting the last BFV (N BF V ) (Sequence split) are PCA compressed, resulting P CAAP . The one to one correspondence between P CAAP ∗ , AP ′ and AP , allows us to compute for each AP C the mean (Mapc ) and covariance (Σapc ) of all member P CAAP . These are used to compute for an ap (instance of an AP) a Mahanabolis-distance based similarity simapc (ap) = eδ(ap)

T

∗Σapc ∗δ(ap)

Body Feature Vector

Sequence generation

Normalisation

Concatenate all D+1

AP’ scaled to [0..1]

long, neighbouring BFVs

BFV

2.3. Classification Classification of the ap action primitive is performed by mean of the APCs. The probability Λapc,i of label for an APC is precomputed, while the similarity of ap is computed by simapc (ap). By marginalising, the probability of label i for an unknown ap is X Pi (ap) = Λapc,i ∗ simapc (ap). (3) apc

(1)

to any instance apc of an APC, with δapc (ap) = Mapc −ap. The generated model consists of a description, sequence length D, number of clusters C, PCA compression parameters (eigen-vector number, mean, base vectors) and for each APC its mean and covariance. One can note that APs used for clustering have a length of D + 1 assuring smoothness of the APC over D + 1 BFVs. This allows the prediction of a next BFV (N BF V ) from the preceding AP of length D, useful for the tracking of the model. Input

where the label presence, Li (ap), is 1 if AP has label i and 0 if not. The cluster forming is separated from training, therefore new labels and more training sequences can be added efficiently, without repeating the clustering.

AP’

AP*

Sequence split

PCA compression

Reduce sequence length to D: AP’=[AP NBFV]

of AP*, reducing parameter space

AP, NBFV

PCAAP*

3. Evaluation The APM is trained with the train partition of the HumanEva dataset[15]. This dataset has synchronised motion capture data and multiple camera views. We use the motion capture data for training and evaluation, while video based recognition is briefly introduced in section 3.3. Sequences, categorised into five basic activities, Box, Gesture, Jog, Throw/Catch and Walking, are performed by three different subjects, and are divided into train, validation and test partitions. Between 29519 (D = 35) and 33960 (D = 3) AP sequences from the train partition are used for training the APM; for evaluation the validation partition is used.

3.1. Number of clusters and sequence length AP model Description D, C, PCAprecision Eigv_no Mean BaseProjection

PCA compression For each APC − mean and coveriance of {PCAAP} − probability of each label

of AP, reducing the parameter space PCAAP

Clustering AP’ (AP) is assigned to a cluster

Figure 1: Block diagram of clustering algorithm

2.2. Cluster labelling The training algorithm requires that each instance ap is labelled manually with the appropriate set of symbolic labels Lap . The probability of the label i for each of the APCs is P ap simapc (ap) ∗ Li (ap) P Λapc,i = (2) ap simapc (ap) 1 Matlab Expectation Maximisation Clustering algorithm from Frank Dellaert, available from http://www-static.cc.gatech.edu/ dellaert

The APC learning has two explicit parameters, the AP length D and the number of clusters C. If the length of the sequence increases the classes are more separable. Diagrams from Figure 2 and 3 show the distribution of all APs from the training sequence. Sequences from different activities are expected to fall in different clusters, while the same activities are in the same cluster. In Figure 2(a) cluster number 10 has in total 2443 APs from Box, Throw/Catch, Jog and Gesture activities, the majority of samples being from the first two activities. This cluster is the worst performing cluster of Figure 2, as membership of an AP to a cluster leaves a large uncertainty of the activity. Higher values of D show clusters with fewer or with less dominating activities. For numerical analysis of the histograms from figures 2 and 3 we use the uniformity U=

X χ∈X

X (cmax )2 χ / cmax χ c χ,α α∈A(χ)

P

(4)

χ∈X

with cmax = maxα∈A(χ) (cχ,α ) χ

(5)

where X is the APC set, A(χ) are the activities of cluster χ, and cχ,α is the count of APs of activity α in cluster χ. The uniformity U = 1 if all clusters have APs from the same class only; if clusters are not uniform then it favours clusters with a higher number of APs, penalising clusters with fewer APs. Table 1: Cluster uniformity U as a function of number of clusters C and sequence length D C 20 40 60 80 100

3 90.48% 92.34% 94.67% 95.79% 95.81%

5 84.67% 92.32% 94.19% 95.99% 97.05%

D 15 88.32% 91.53% 94.98% 95.62% 96.47%

25 87.91% 95.13% 95.13% 95.98% 96.41%

35 88.72% 92.55% 95.33% 96.06% 96.73%

Table 1 confirms that the increase of D or C results in better classification. The tendency of increasing uniformity with D is motivated by the length of actions (these basically are activities), and therefore longer time span of the AP produces better classification, but this enhancement is limited by the number of clusters. One would expect the number of clusters to be equal to the number of activities (i.e., 5 in our case). This is not the case for many reasons: the high dimensional parameter space is multi-modal, and clusters are used to classify not just one class of exclusive activities, but also actions, which can overlap and combine independently with other actions. This requires a C high enough to allow combinations between different actions. Increasing the number of clusters is limited by the training set size because each cluster requires enough samples to train its mean and covariance.

3.2. Classification Five HumanEva train sequences were labelled with the 10 detailed labels from Table 2 for testing the labelled AP model. In addition, 15 other sequences of Subject S1 and S2 were labelled with just one of the Walk Activity, Throw/Catch Activity, Jog Activity, Gestures Activity, Box Activity labels. For extensive testing, further training and testing subjects and local labels are desired, but unfortunately, no such database is available. Figure 4 shows the classification results for the Walking, Jog and Gesture validate sequences of HumanEva. While Subjects S1 and S2 have training data (from the train sequences), Subject S3 was not included in the label training. The Walking label in the S1 Walking sequence has a probability above 0.9 except around frame 63, where the similar Box is recognised; the four stride labels and four arm forward and backward labels can observed with a periodicity, presumed explicitly neither in the training nor in the recognition. The least accurate recognition is for S3 Gesture, due

Table 2: Local labels: descriptions and training sequences Label Left/right stride back Left/right stride front

Description Left/right leg is moving forward from behind of the right/left leg Left/right leg is moving forward ahead of the right/left leg

Left/right arm forward

Left/right arm is moving forward

Left/right arm backward

Left/right arm is moving back-wards

Left/right hand throw

Right/left hand throw

Training sequences S1 Walking 1, S1 Walking 3 S1 Walking 1, S1 Walking 3 S1 Walking 1, S1 Walking 3, S1 ThrowCatch 1, S2 ThrowCatch 1, S2 ThrowCatch 3 S1 Walking 1, S1 Walking 3, S1 ThrowCatch 1, S2 ThrowCatch 1, S2 ThrowCatch 3 S1 Walking 1, S1 Walking 3, S1 ThrowCatch 1, S2 ThrowCatch 1, S2 ThrowCatch 3

to the similarities with the Throw/Catch and Box activities. Although Subject S3 was not trained, Walk and Jog is recognised well, generalising only from Subjects S1 and S2. In Figure 5, for D = 5, the four leg strides are recognised, as well as the symmetry of right and left forward and backward arm motions. For both D = 5 and D = 15 the global activities are accurate. Greater D results in APs longer than the short actions (i.e., leg strides) therefore they are misclassified; Walk, Throw/Catch and Gesture are confused because the higher temporal smoothing of the longer APs that are clustered into a limited number of APCs; more clusters give finer-detailed classification. The confusion matrices (Figure 6) of the global activities, classifying all APs of the HumanEva validate sequences, show the overall recognition performance. Walk and Jog are recognised the best, while the other three static activities have higher misclassification. Better recognition of these long-term activities is obtained with short sequence length and many clusters.

3.3. APM in tracking APM is used with the results of our initial tracking of the S2 Walking 1 sequence. The tracking is based on particle filters adapted for the articulated human body[16]. A hierarchical approach improves the tracking, by fitting the hierarchically higher-level structures (e.g., torso) first, followed by the upper and lower limbs. Figure 7 shows the APM classification of one stride, 12 frames, based on the tracking results. In the particle filter framework the current state of the system is defined by the cloud of particles. To recover the current labels, the mode of the particle set can be found; this defines the current AP, which, by means of finding its APC,

4000

3500 Walking Jog Box Gesture ThrowCatch

3500

4000 Walking Jog Box Gesture ThrowCatch

3000

3500 Walking Jog Box Gesture ThrowCatch

3500

3000

3000 2500

2500

2000

1500

1500

Cluster elements

2000

Cluster elements

2500 Cluster elements

Cluster elements

2500

2000

2000

1500

1500 1000

1000

1000

1000

500

500

0

Walking Jog Box Gesture ThrowCatch

3000

1

3

5

7

9

11 13 Cluster no

15

17

19

500

500

0

1

3

5

(a) D = 3

7

9

11 13 Cluster no

15

17

0

19

1

3

5

7

(b) D = 5

9

11 13 Cluster no

15

17

0

19

1

3

5

7

(c) D = 15

9

11 13 Cluster no

15

17

19

(d) D = 25

Figure 2: HumanEva basic activity distribution in C = 20 clusters, varying sequence length D 3500

2500 Walking Jog Box Gesture ThrowCatch

3000

1800 Walking Jog Box Gesture ThrowCatch

2000

1600 Walking Jog Box Gesture ThrowCatch

1600

Walking Jog Box Gesture ThrowCatch

1400

1400 1200

2500

1000

Cluster elements

1500

1000

1500

Cluster elements

Cluster elements

Cluster elements

1200 2000

1000

800

800

600 600

1000 400 400

500 500

200

200

0

1

3

5

7

9

11 13 Cluster no

15

17

0

19

1

3

5

7

9

(a) C = 20

0

11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 Cluster no

1

5

9

(b) C = 40

13

17

21

25

29

33 37 41 Cluster no

45

49

53

0

57

1

6

(c) C = 60

11

16

21

26

31

36

41 46 51 Cluster no

56

61

66

71

76

(d) C = 80

Figure 3: HumanEva basic activity distribution in the clusters for sequence length D = 25, varying number of clusters C Jog

Walking

S1

Gesture

Walk Activity

Walk Activity

Walk Activity

Throw_Catch Activity

Throw_Catch Activity

Throw_Catch Activity

Jog Activity

Jog Activity

Jog Activity

Gestures Activity

Gestures Activity

Gestures Activity

Box Activity

Box Activity

Box Activity

Left stride (back)

Left stride (back)

Left stride (back)

Left stride (front)

Left stride (front)

Left stride (front)

Right stride (back)

Right stride (back)

Right stride (back)

Right stride (front)

Right stride (front)

Right stride (front)

0.9

0.8

0.7

0.6

Left arm forward

Left arm forward

Left arm forward

Left arm backwards

Left arm backwards

Left arm backwards

Right arm forward

Right arm forward

Right arm forward

Right arm backwards

Right arm backwards

Right arm backwards

Right hand throw

Right hand throw

Right hand throw

Right hand catch

Right hand catch

Right hand catch

0.5

0.4

Left hand catch

Left hand catch 20

40

60

80

100 120 Frame number

160

40

60 Frame number

80

100

20

Walk Activity

Walk Activity

Throw_Catch Activity

Throw_Catch Activity

Jog Activity

Jog Activity

Jog Activity

Gestures Activity

Gestures Activity

Gestures Activity

Box Activity

Box Activity

Box Activity

Left stride (back)

Left stride (back)

Left stride (back)

Left stride (front)

Left stride (front) Student Version of MATLAB

Right stride (front)

Left arm forward

Left arm forward

Left arm forward

Left arm backwards Right arm forward

Right arm forward

Right arm backwards

Right arm backwards

Right hand throw

Right hand throw

Right hand throw

Right hand catch

Right hand catch

Left hand catch

Left hand catch 60

80 Frame number

100

120

140

100

120

1.5

Student Version of MATLAB

Left arm backwards

Right arm forward Right arm backwards

40

60 80 Frame number

Right stride (back)

Right stride (front)

20

40

Left stride (front)

Right stride (back)

Right stride (front)

Left arm backwards

0.1

120

Walk Activity

Student Version of MATLAB

0.2

Left hand catch 20

180

Throw_Catch Activity

Right stride (back)

S3

140

0.3

Right hand catch Left hand catch 20

40

60 80 Frame number

100

120

20

40

60

80 100 Frame number

120

140

160

Figure 4: Recovered labels for Walking, Jogging and Gesture activities of subject S1(trained) and S3 (not trained) with model of C = 100 and D = 5. The probability of labels for each frame (on horizontal axis) is colour coded. delivers the current label set. The mode of the particle distribution is unstable, therefore in the approach taken, each particle defines an AP related to an APC, and thus a set of weighted labels that average into global labels.

4. Conclusions We have presented an AP based method to recognise human actions and activities by means of a temporal history

of the model parameters, called action primitives. Unified handling of actions and activities and detailed labelling are the main features of the method. Although a higher-level semantic mechanism could further enhance the results, the combined labels of different contents yield more detailed results than other current approaches ([3, 12]) that classify just the activity with a single global label. The independent learning of clusters and labels allows for the APM to

D = 5

D = 15

D = 25

Walk Activity

Walk Activity

Walk Activity

Throw_Catch Activity

Throw_Catch Activity

Throw_Catch Activity

Jog Activity

Jog Activity

Jog Activity

Gestures Activity

Gestures Activity

Gestures Activity

Box Activity

Box Activity

Box Activity

0.9

0.8

0.7 Left stride (back)

C = 20

Left stride (back)

Left stride (back)

Left stride (front)

Left stride (front)

Left stride (front)

Right stride (back)

Right stride (back)

Right stride (back)

Right stride (front)

Right stride (front)

Right stride (front)

Left arm forward

Left arm forward

Left arm forward

Left arm backwards

Left arm backwards

Left arm backwards

Right arm forward

Right arm forward

Right arm forward

Right arm backwards

Right arm backwards

Right arm backwards

Right hand throw

Right hand throw

Right hand throw

Right hand catch

Right hand catch

Right hand catch

Left hand catch

Left hand catch

Left hand catch

20

40

60

80

100 120 Frame number

140

160

180

20

40

60

80

100 120 Frame number

140

160

180

0.6

0.5

0.4

0.3

0.2

0.1 20

40

60

80

100 120 Frame number

140

160

180

1.5 Walk Activity

Walk Activity

Walk Activity

Throw_Catch Activity

Throw_Catch Activity

Throw_Catch Activity

Jog Activity

Jog Activity

Jog Activity

Gestures Activity

Gestures Activity

Gestures Activity

Box Activity

Box Activity

Box Activity

Left stride (back)

C = 60

Left stride (back)

Left stride (back)

Left stride (front)

Left stride (front)

Left stride (front)

Right stride (back)

Right stride (back)

Right stride (back)

Right stride (front)

Right stride (front)

Right stride (front)

Left arm forward

Left arm forward

Left arm forward

Left arm backwards

Left arm backwards

Left arm backwards

Right arm forward

Right arm forward

Right arm forward

Right arm backwards

Right arm backwards

Right arm backwards

Right hand throw

Right hand throw

Right hand throw

Right hand catch

Right hand catch

Left hand catch

Left hand catch 20

40

60

80

100 120 Frame number

140

160

Right hand catch Left hand catch

180

20

40

60

80

100 120 Frame number

140

160

180

Walk Activity

Walk Activity

Walk Activity

Throw_Catch Activity

Throw_Catch Activity

Throw_Catch Activity

Jog Activity

Jog Activity

Jog Activity

Gestures Activity

Gestures Activity

Gestures Activity

Box Activity

Box Activity

Box Activity

Left stride (back)

C = 100

Left stride (back) Left stride (front)

Left stride (front)

Right stride (back)

Right stride (back)

Right stride (front)

Right stride (front)

Right stride (front)

Left arm forward

Left arm forward

Left arm forward

Left arm backwards

Left arm backwards

Left arm backwards

Right arm forward

Right arm forward

Right arm forward

Right arm backwards

Right arm backwards

Right arm backwards

Right hand throw

Right hand throw

Right hand throw

Right hand catch

Right hand catch

Left hand catch

Left hand catch 40

60

80

100 120 Frame number

140

160

40

60

80

100 120 Frame number

140

160

180

20

40

60

80

100 120 Frame number

140

160

180

Left stride (back)

Left stride (front) Right stride (back)

20

20

Right hand catch Left hand catch

180

20

40

60

80

100 120 Frame number

140

160

180

Figure 5: S1 Walking 1 activity recognition with C = 20, 60, 100 and D = 5, 15, 25 models C = 20

C = 60

C = 100

Walk

Walk

Walk

T/C

T/C

T/C

Gest.

Truth

Truth

D = 5

Truth

0.8

Gest.

0.6 Gest.

0.4 Box

Box

Box

Jog

Jog

Jog

0.2 Walk Walk

T/C

T/C

Gest.

Box

Jog

Jog Gest.

Recognised

Box

Jog Walk

T/C

T/C

Box

Jog

Jog T/C

Gest.

Recognised

Box

Jog

Jog

Walk

T/C

Walk

T/C

Gest.

Box

JogJog

Gest.

Box

Jog

Recognised

T/C Gest.

References

Box Jog T/C

Gest.

Recognised

Box

Jog

Recognised

Walk T/C

Gest.

Box

Walk

Box

Walk

Walk

Walk

Gest.

Gest.

Recognised

Gest.

Box

T/C

T/C

Truth

Jog

Truth

Box

Truth

Truth

Gest.

Recognised

Walk

Walk

D = 25

T/C

Truth

D = 15

Truth

Walk

informative level of the label. Further work could focus on higher-level filtering and interpretation of the low level symbolic information in order to enhance the output. The results of the tracking and behaviour analysis will be further integrated and studied under a common framework.

Gest. Box Jog

Walk

T/C

Gest.

Recognised

Box

Jog

Walk

T/C

Gest.

Recognised

Box

Jog

Figure 6: Confusion matrices for number of clusters C = 20, 60, 100 and sequence length D = 5, 15, 25 be extended or replaced with new dynamic data or with new labels. We have also shown that shorter APs are more informative then longer APs. Although the method works well for periodic activities, it is not restricted to these. In this paper the BFV consists of 17 parameters, the body joint angles. Replicating the model at different body levels, for BFVs with a subset of the joint angles only (e.g., the leg parameters) would result in more accurate labelling of these body parts, but this requires the selection of the most

[1] A. E. Elgammal and L. S. Davis, “Probabilistic framework for segmenting people under occlusion,” in Proc. International Conference on Computer Vision, vol. 2, pp. 145–152, 2001. [2] J. Kang, I. Cohen, and G. G. Medioni, “Persistent objects tracking across multiple non overlapping cameras,” in Proc. of IEEE Workshop on Motion and Video Computing, pp. 112–119, 2005. [3] I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveillance of people and their activities,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. vol.22, no. 8, pp. 809–830, 2002. [4] Q. Zhao, J. Kang, H. Tao, and W. Hua, “Part based human tracking in A multiple cues fusion framework,”

General: Walk (0.900619) Turn left (0.912236) Arm (left): Left arm forward (0.760678) Arm (right): Right arm backwards (0.746047)

General: Walk (0.961098) Turn left (0.966586) Arm (left): Left arm forward (0.834587) Arm (right): Right arm backwards (0.844667)

General: Walk (0.968151) Turn left (0.981202) Arm (left): Left arm forward (0.876618) Arm (right): Right arm backwards (0.887316)

General: Walk (0.964948) Turn left (0.979312) Arm (left): Left arm forward (0.876692) Arm (right): Right arm backwards (0.885811)

Leg (right):

Leg (right):

Leg (right):

Leg (right):

Right stride (back) (0.831612)

Frame 31 General: Turn left (0.808296) Walk (0.633265) Arm (left): Left arm forward (0.585413) Arm (right): Right arm backwards (0.60193) Leg (right):

Frame 34 General:

Turn left (0.714061)

Right stride (front) (0.861949)

Frame 37 General:

Turn left (0.592595)

Right stride (front) (0.922299)

Frame 40 General:

Turn left (0.560025)

Right stride (front) (0.605347)

Frame 43 General: Walk (0.920567) Turn left (0.937066) Arm (left): Left arm forward (0.651617) Arm (right): Right arm backwards (0.624149) Leg (right):

Right stride (back) (0.707901)

Frame 46 General: Walk (0.992063) Turn left (0.993712) Arm (left): Left arm backwards (0.647518) Arm (right): Right arm forward (0.634761) Leg (left): Left stride (back) (0.631654)

Frame 49 General: Walk (1) Turn left (1) Arm (left): Left arm backwards (0.747883) Arm (right): Right arm forward (0.735915) Leg (left): Left stride (back) (0.768904)

Frame 52 General: Walk (1) Turn left (1) Arm (left): Left arm backwards (0.869809) Arm (right): Right arm forward (0.871707) Leg (left): Left stride (back) (0.884314)

Right stride (front) (0.778208)

Frame 55

Frame 58

Frame 61

Frame 64

Figure 7: Tracking results for frames 31-64, with human model back projected to one camera view out of the three views. The images have overlaid the recovered labels, grouped into the hierarchical levels shown in grey. The probability of each label is in brackets; the labels are shown in green (with probability above 0.8) or in blue (with probability between 0.5 and 0.8) in Proc. International Conference on Pattern Recognition, pp. 450–455, 2006. [5] J. Deutscher and I. D. Reid, “Articulated body motion capture by stochastic search,” International Journal of Computer Vision, vol. 61, pp. 185–205, Feb. 2005. [6] A. O. Balan, L. Sigal, and M. J. Black, “A quantitative evaluation of video-based 3D person tracking,” in Proc. Int’l Workshop on Performance Evaluation of Tracking and Surveillance, pp. 349–356, 2005. [7] S. Kim, C.-B. Park, and S.-W. Lee, “Tracking 3D human body using particle filter in moving monocular camera,” in Proc. International Conference on Pattern Recognition, pp. 805–808, 2006. [8] L. Sigal, S. Bhatia, S. Roth, M. J. Black, and M. Isard, “Tracking loose-limbed people,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 421–428, 2004. [9] N. R. Howe, “Silhouette lookup for monocular 3D pose tracking,” Image and Vision Computing, vol. 25, pp. 331–341, Mar. 2007. [10] M. Brand and A. Hertzmann, “Style machines,” in SIGGRAPH, pp. 183–192, 2000.

[11] J. Assa, Y. Caspi, and D. Cohen-Or, “Action synopsis: pose selection and illustration,” in ACM Trans. on Graphics, vol. 24(3), pp. 667–676, 2005. [12] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Proc. International Conference on Computer Vision, vol. 2, pp. 1395–1402, 2005. [13] J. K. Aggarwal and S. Park, “Human motion: Modeling and recognition of actions and interactions,” in Proc. 2nd Int’l Symp. on 3D Data Processing, Visualization and Transmission, pp. 640–647, 2004. [14] T. B. Moeslund, A. Hilton, and V. Kruger, “A survey of advances in vision-based human motion capture and analysis,” Computer Vision and Image Understanding, vol. 103, pp. 90–126, Nov. 2006. [15] L. Sigal and M. J. Black, “HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion,” Tech. Rep. CS-06-08, Brown University, 2006. [16] Z. Husz, A. M. Wallace, and P. R. Green, “Hierarchical, model based tracking with particle filtering,” in Detection vs Tracking, BMVA symposium, 2006.

Suggest Documents