clusters with Expectation Maximisation initialised by a K- means1. ... ent subjects, and are divided into train, validation and test partitions. Between 29519 (D ...
Human activity recognition with action primitives Zsolt L. Husz
Andrew M. Wallace
Patrick R. Green
Joint Research Institute in Signal and Image Processing School of Engineering and Physical Sciences School of Life Sciences Heriot-Watt University Edinburgh, UK
Abstract This paper considers the link between tracking algorithms and high-level human behavioural analysis, introducing the action primitives model that recovers symbolic labels from tracked limb configurations. The model consists of similar short-term actions, action primitives clusters, formed automatically and then labelled by supervised learning. The model allows both short actions and longer activities, either periodic or aperiodic. New labels are added incrementally. We determine the effects of model parameters on the labelling of action primitives using ground truth derived from a motion capture system. We also present a representative example of a labelled video sequence.
1. Introduction The analysis of human behaviour for surveillance, entertainment, intelligent domiciles and medical applications is of increasing interest. Video information is a valuable and easily acquired input for these domains. Previous research ranges from 2D tracking of human figures reduced to blobs ([1, 2]) or articulated human models ([3, 4]) through to 3D tracking of articulated human models ([5, 6, 7, 8, 9]). Although the results are promising, they usually stop at tracking, omitting higher-level reasoning or assigning only a single label; they are usually limited to periodic activities. Other previous work [10, 11] tackles the extraction of poses and motions relevant to a human observer, mainly for video segmentation and computer animation. Starting from the idea of important poses and actions, our paper presents a technique that first learns important actions (pose sequences) and then learns their semantic labels. In contrast to other work ([3, 12]) these sequences are not required to be periodic. These are not just global activities, but include both whole body (e.g., running, bending, standing) and local (e.g., right arm forward, left arm reaching) labels. For an overview of human action recognition and tracking [13] and [14] can be consulted.
978-1-4244-1696-7/07/$25.00 ©2007 IEEE.
2. The Action Primitives Model In this paper an action denotes a short sequence of body configurations (e.g., leg rising, arm still). It is usually, but not exclusively, defined by one or a few body parts. An activity denotes a sequence of body configurations over a longer time span. Activities can be assembled from one or more actions, and actions can specify details of an activity (e.g., walking with raised left arm). The related articulated human model consists of 12 body parts, each modelled by a truncated frustum with an elliptical subsection. The body parts have constant lengths and the body configuration is completely defined by 24 parameters, formed by the body root position, body orientation, spine tilt and azimuth, and 17 joint angles. A body feature vector (BFV) is a (sub)set of the parameters describing a body configuration, having as elements either direct features (joint angle, body position, orientation) or features derived by simple operators from direct features (velocity, global position vector). For clarity, in this paper we use a BFV with direct features, the 17 joint angles, but we have used a wider variety of features (e.g., position) where this is necessary to define an action. An action primitive (AP), a basic action, is part of the larger time scale activity, and is a set of consecutive BFVs capturing both the static and the dynamic configuration of the body. The developed action primitives model (APM) consists of AP clusters (APC), a group of similar APs. Each APC has a set of labels to characterise the APs from the cluster. The model is based on the idea that activities can be partly or completely constructed from smaller components (action primitives).
2.1. Action primitives clusters The AP clusters are learnt from a training dataset by unsupervised learning (Figure 1). Sequences of input BFVs, generated by a motion capture system, result in APs (AP ′ ) of length D + 1 (in the Sequence generation phase). To allow equal importance to all parameters, AP ′ is normalised
(Normalisation) by scaling the nominal range of parameters, defined by physical constraints on joint angles, between 0 and 1. Both the parameters of a BFV at a given time and the parameters of consecutive BFVs (i.e., AP) are highly correlated, therefore principal component analysis (PCA) is applied to reduce the dimensionality of APs. The resulting compressed P CAAP ∗ are clustered into Cclusters with Expectation Maximisation initialised by a Kmeans1 . All subsequences AP of AP ′ , omitting the last BFV (N BF V ) (Sequence split) are PCA compressed, resulting P CAAP . The one to one correspondence between P CAAP ∗ , AP ′ and AP , allows us to compute for each AP C the mean (Mapc ) and covariance (Σapc ) of all member P CAAP . These are used to compute for an ap (instance of an AP) a Mahanabolis-distance based similarity simapc (ap) = eδ(ap)
T
∗Σapc ∗δ(ap)
Body Feature Vector
Sequence generation
Normalisation
Concatenate all D+1
AP’ scaled to [0..1]
long, neighbouring BFVs
BFV
2.3. Classification Classification of the ap action primitive is performed by mean of the APCs. The probability Λapc,i of label for an APC is precomputed, while the similarity of ap is computed by simapc (ap). By marginalising, the probability of label i for an unknown ap is X Pi (ap) = Λapc,i ∗ simapc (ap). (3) apc
(1)
to any instance apc of an APC, with δapc (ap) = Mapc −ap. The generated model consists of a description, sequence length D, number of clusters C, PCA compression parameters (eigen-vector number, mean, base vectors) and for each APC its mean and covariance. One can note that APs used for clustering have a length of D + 1 assuring smoothness of the APC over D + 1 BFVs. This allows the prediction of a next BFV (N BF V ) from the preceding AP of length D, useful for the tracking of the model. Input
where the label presence, Li (ap), is 1 if AP has label i and 0 if not. The cluster forming is separated from training, therefore new labels and more training sequences can be added efficiently, without repeating the clustering.
AP’
AP*
Sequence split
PCA compression
Reduce sequence length to D: AP’=[AP NBFV]
of AP*, reducing parameter space
AP, NBFV
PCAAP*
3. Evaluation The APM is trained with the train partition of the HumanEva dataset[15]. This dataset has synchronised motion capture data and multiple camera views. We use the motion capture data for training and evaluation, while video based recognition is briefly introduced in section 3.3. Sequences, categorised into five basic activities, Box, Gesture, Jog, Throw/Catch and Walking, are performed by three different subjects, and are divided into train, validation and test partitions. Between 29519 (D = 35) and 33960 (D = 3) AP sequences from the train partition are used for training the APM; for evaluation the validation partition is used.
3.1. Number of clusters and sequence length AP model Description D, C, PCAprecision Eigv_no Mean BaseProjection
PCA compression For each APC − mean and coveriance of {PCAAP} − probability of each label
of AP, reducing the parameter space PCAAP
Clustering AP’ (AP) is assigned to a cluster
Figure 1: Block diagram of clustering algorithm
2.2. Cluster labelling The training algorithm requires that each instance ap is labelled manually with the appropriate set of symbolic labels Lap . The probability of the label i for each of the APCs is P ap simapc (ap) ∗ Li (ap) P Λapc,i = (2) ap simapc (ap) 1 Matlab Expectation Maximisation Clustering algorithm from Frank Dellaert, available from http://www-static.cc.gatech.edu/ dellaert
The APC learning has two explicit parameters, the AP length D and the number of clusters C. If the length of the sequence increases the classes are more separable. Diagrams from Figure 2 and 3 show the distribution of all APs from the training sequence. Sequences from different activities are expected to fall in different clusters, while the same activities are in the same cluster. In Figure 2(a) cluster number 10 has in total 2443 APs from Box, Throw/Catch, Jog and Gesture activities, the majority of samples being from the first two activities. This cluster is the worst performing cluster of Figure 2, as membership of an AP to a cluster leaves a large uncertainty of the activity. Higher values of D show clusters with fewer or with less dominating activities. For numerical analysis of the histograms from figures 2 and 3 we use the uniformity U=
X χ∈X
X (cmax )2 χ / cmax χ c χ,α α∈A(χ)
P
(4)
χ∈X
with cmax = maxα∈A(χ) (cχ,α ) χ
(5)
where X is the APC set, A(χ) are the activities of cluster χ, and cχ,α is the count of APs of activity α in cluster χ. The uniformity U = 1 if all clusters have APs from the same class only; if clusters are not uniform then it favours clusters with a higher number of APs, penalising clusters with fewer APs. Table 1: Cluster uniformity U as a function of number of clusters C and sequence length D C 20 40 60 80 100
3 90.48% 92.34% 94.67% 95.79% 95.81%
5 84.67% 92.32% 94.19% 95.99% 97.05%
D 15 88.32% 91.53% 94.98% 95.62% 96.47%
25 87.91% 95.13% 95.13% 95.98% 96.41%
35 88.72% 92.55% 95.33% 96.06% 96.73%
Table 1 confirms that the increase of D or C results in better classification. The tendency of increasing uniformity with D is motivated by the length of actions (these basically are activities), and therefore longer time span of the AP produces better classification, but this enhancement is limited by the number of clusters. One would expect the number of clusters to be equal to the number of activities (i.e., 5 in our case). This is not the case for many reasons: the high dimensional parameter space is multi-modal, and clusters are used to classify not just one class of exclusive activities, but also actions, which can overlap and combine independently with other actions. This requires a C high enough to allow combinations between different actions. Increasing the number of clusters is limited by the training set size because each cluster requires enough samples to train its mean and covariance.
3.2. Classification Five HumanEva train sequences were labelled with the 10 detailed labels from Table 2 for testing the labelled AP model. In addition, 15 other sequences of Subject S1 and S2 were labelled with just one of the Walk Activity, Throw/Catch Activity, Jog Activity, Gestures Activity, Box Activity labels. For extensive testing, further training and testing subjects and local labels are desired, but unfortunately, no such database is available. Figure 4 shows the classification results for the Walking, Jog and Gesture validate sequences of HumanEva. While Subjects S1 and S2 have training data (from the train sequences), Subject S3 was not included in the label training. The Walking label in the S1 Walking sequence has a probability above 0.9 except around frame 63, where the similar Box is recognised; the four stride labels and four arm forward and backward labels can observed with a periodicity, presumed explicitly neither in the training nor in the recognition. The least accurate recognition is for S3 Gesture, due
Table 2: Local labels: descriptions and training sequences Label Left/right stride back Left/right stride front
Description Left/right leg is moving forward from behind of the right/left leg Left/right leg is moving forward ahead of the right/left leg
Left/right arm forward
Left/right arm is moving forward
Left/right arm backward
Left/right arm is moving back-wards
Left/right hand throw
Right/left hand throw
Training sequences S1 Walking 1, S1 Walking 3 S1 Walking 1, S1 Walking 3 S1 Walking 1, S1 Walking 3, S1 ThrowCatch 1, S2 ThrowCatch 1, S2 ThrowCatch 3 S1 Walking 1, S1 Walking 3, S1 ThrowCatch 1, S2 ThrowCatch 1, S2 ThrowCatch 3 S1 Walking 1, S1 Walking 3, S1 ThrowCatch 1, S2 ThrowCatch 1, S2 ThrowCatch 3
to the similarities with the Throw/Catch and Box activities. Although Subject S3 was not trained, Walk and Jog is recognised well, generalising only from Subjects S1 and S2. In Figure 5, for D = 5, the four leg strides are recognised, as well as the symmetry of right and left forward and backward arm motions. For both D = 5 and D = 15 the global activities are accurate. Greater D results in APs longer than the short actions (i.e., leg strides) therefore they are misclassified; Walk, Throw/Catch and Gesture are confused because the higher temporal smoothing of the longer APs that are clustered into a limited number of APCs; more clusters give finer-detailed classification. The confusion matrices (Figure 6) of the global activities, classifying all APs of the HumanEva validate sequences, show the overall recognition performance. Walk and Jog are recognised the best, while the other three static activities have higher misclassification. Better recognition of these long-term activities is obtained with short sequence length and many clusters.
3.3. APM in tracking APM is used with the results of our initial tracking of the S2 Walking 1 sequence. The tracking is based on particle filters adapted for the articulated human body[16]. A hierarchical approach improves the tracking, by fitting the hierarchically higher-level structures (e.g., torso) first, followed by the upper and lower limbs. Figure 7 shows the APM classification of one stride, 12 frames, based on the tracking results. In the particle filter framework the current state of the system is defined by the cloud of particles. To recover the current labels, the mode of the particle set can be found; this defines the current AP, which, by means of finding its APC,
4000
3500 Walking Jog Box Gesture ThrowCatch
3500
4000 Walking Jog Box Gesture ThrowCatch
3000
3500 Walking Jog Box Gesture ThrowCatch
3500
3000
3000 2500
2500
2000
1500
1500
Cluster elements
2000
Cluster elements
2500 Cluster elements
Cluster elements
2500
2000
2000
1500
1500 1000
1000
1000
1000
500
500
0
Walking Jog Box Gesture ThrowCatch
3000
1
3
5
7
9
11 13 Cluster no
15
17
19
500
500
0
1
3
5
(a) D = 3
7
9
11 13 Cluster no
15
17
0
19
1
3
5
7
(b) D = 5
9
11 13 Cluster no
15
17
0
19
1
3
5
7
(c) D = 15
9
11 13 Cluster no
15
17
19
(d) D = 25
Figure 2: HumanEva basic activity distribution in C = 20 clusters, varying sequence length D 3500
2500 Walking Jog Box Gesture ThrowCatch
3000
1800 Walking Jog Box Gesture ThrowCatch
2000
1600 Walking Jog Box Gesture ThrowCatch
1600
Walking Jog Box Gesture ThrowCatch
1400
1400 1200
2500
1000
Cluster elements
1500
1000
1500
Cluster elements
Cluster elements
Cluster elements
1200 2000
1000
800
800
600 600
1000 400 400
500 500
200
200
0
1
3
5
7
9
11 13 Cluster no
15
17
0
19
1
3
5
7
9
(a) C = 20
0
11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 Cluster no
1
5
9
(b) C = 40
13
17
21
25
29
33 37 41 Cluster no
45
49
53
0
57
1
6
(c) C = 60
11
16
21
26
31
36
41 46 51 Cluster no
56
61
66
71
76
(d) C = 80
Figure 3: HumanEva basic activity distribution in the clusters for sequence length D = 25, varying number of clusters C Jog
Walking
S1
Gesture
Walk Activity
Walk Activity
Walk Activity
Throw_Catch Activity
Throw_Catch Activity
Throw_Catch Activity
Jog Activity
Jog Activity
Jog Activity
Gestures Activity
Gestures Activity
Gestures Activity
Box Activity
Box Activity
Box Activity
Left stride (back)
Left stride (back)
Left stride (back)
Left stride (front)
Left stride (front)
Left stride (front)
Right stride (back)
Right stride (back)
Right stride (back)
Right stride (front)
Right stride (front)
Right stride (front)
0.9
0.8
0.7
0.6
Left arm forward
Left arm forward
Left arm forward
Left arm backwards
Left arm backwards
Left arm backwards
Right arm forward
Right arm forward
Right arm forward
Right arm backwards
Right arm backwards
Right arm backwards
Right hand throw
Right hand throw
Right hand throw
Right hand catch
Right hand catch
Right hand catch
0.5
0.4
Left hand catch
Left hand catch 20
40
60
80
100 120 Frame number
160
40
60 Frame number
80
100
20
Walk Activity
Walk Activity
Throw_Catch Activity
Throw_Catch Activity
Jog Activity
Jog Activity
Jog Activity
Gestures Activity
Gestures Activity
Gestures Activity
Box Activity
Box Activity
Box Activity
Left stride (back)
Left stride (back)
Left stride (back)
Left stride (front)
Left stride (front) Student Version of MATLAB
Right stride (front)
Left arm forward
Left arm forward
Left arm forward
Left arm backwards Right arm forward
Right arm forward
Right arm backwards
Right arm backwards
Right hand throw
Right hand throw
Right hand throw
Right hand catch
Right hand catch
Left hand catch
Left hand catch 60
80 Frame number
100
120
140
100
120
1.5
Student Version of MATLAB
Left arm backwards
Right arm forward Right arm backwards
40
60 80 Frame number
Right stride (back)
Right stride (front)
20
40
Left stride (front)
Right stride (back)
Right stride (front)
Left arm backwards
0.1
120
Walk Activity
Student Version of MATLAB
0.2
Left hand catch 20
180
Throw_Catch Activity
Right stride (back)
S3
140
0.3
Right hand catch Left hand catch 20
40
60 80 Frame number
100
120
20
40
60
80 100 Frame number
120
140
160
Figure 4: Recovered labels for Walking, Jogging and Gesture activities of subject S1(trained) and S3 (not trained) with model of C = 100 and D = 5. The probability of labels for each frame (on horizontal axis) is colour coded. delivers the current label set. The mode of the particle distribution is unstable, therefore in the approach taken, each particle defines an AP related to an APC, and thus a set of weighted labels that average into global labels.
4. Conclusions We have presented an AP based method to recognise human actions and activities by means of a temporal history
of the model parameters, called action primitives. Unified handling of actions and activities and detailed labelling are the main features of the method. Although a higher-level semantic mechanism could further enhance the results, the combined labels of different contents yield more detailed results than other current approaches ([3, 12]) that classify just the activity with a single global label. The independent learning of clusters and labels allows for the APM to
D = 5
D = 15
D = 25
Walk Activity
Walk Activity
Walk Activity
Throw_Catch Activity
Throw_Catch Activity
Throw_Catch Activity
Jog Activity
Jog Activity
Jog Activity
Gestures Activity
Gestures Activity
Gestures Activity
Box Activity
Box Activity
Box Activity
0.9
0.8
0.7 Left stride (back)
C = 20
Left stride (back)
Left stride (back)
Left stride (front)
Left stride (front)
Left stride (front)
Right stride (back)
Right stride (back)
Right stride (back)
Right stride (front)
Right stride (front)
Right stride (front)
Left arm forward
Left arm forward
Left arm forward
Left arm backwards
Left arm backwards
Left arm backwards
Right arm forward
Right arm forward
Right arm forward
Right arm backwards
Right arm backwards
Right arm backwards
Right hand throw
Right hand throw
Right hand throw
Right hand catch
Right hand catch
Right hand catch
Left hand catch
Left hand catch
Left hand catch
20
40
60
80
100 120 Frame number
140
160
180
20
40
60
80
100 120 Frame number
140
160
180
0.6
0.5
0.4
0.3
0.2
0.1 20
40
60
80
100 120 Frame number
140
160
180
1.5 Walk Activity
Walk Activity
Walk Activity
Throw_Catch Activity
Throw_Catch Activity
Throw_Catch Activity
Jog Activity
Jog Activity
Jog Activity
Gestures Activity
Gestures Activity
Gestures Activity
Box Activity
Box Activity
Box Activity
Left stride (back)
C = 60
Left stride (back)
Left stride (back)
Left stride (front)
Left stride (front)
Left stride (front)
Right stride (back)
Right stride (back)
Right stride (back)
Right stride (front)
Right stride (front)
Right stride (front)
Left arm forward
Left arm forward
Left arm forward
Left arm backwards
Left arm backwards
Left arm backwards
Right arm forward
Right arm forward
Right arm forward
Right arm backwards
Right arm backwards
Right arm backwards
Right hand throw
Right hand throw
Right hand throw
Right hand catch
Right hand catch
Left hand catch
Left hand catch 20
40
60
80
100 120 Frame number
140
160
Right hand catch Left hand catch
180
20
40
60
80
100 120 Frame number
140
160
180
Walk Activity
Walk Activity
Walk Activity
Throw_Catch Activity
Throw_Catch Activity
Throw_Catch Activity
Jog Activity
Jog Activity
Jog Activity
Gestures Activity
Gestures Activity
Gestures Activity
Box Activity
Box Activity
Box Activity
Left stride (back)
C = 100
Left stride (back) Left stride (front)
Left stride (front)
Right stride (back)
Right stride (back)
Right stride (front)
Right stride (front)
Right stride (front)
Left arm forward
Left arm forward
Left arm forward
Left arm backwards
Left arm backwards
Left arm backwards
Right arm forward
Right arm forward
Right arm forward
Right arm backwards
Right arm backwards
Right arm backwards
Right hand throw
Right hand throw
Right hand throw
Right hand catch
Right hand catch
Left hand catch
Left hand catch 40
60
80
100 120 Frame number
140
160
40
60
80
100 120 Frame number
140
160
180
20
40
60
80
100 120 Frame number
140
160
180
Left stride (back)
Left stride (front) Right stride (back)
20
20
Right hand catch Left hand catch
180
20
40
60
80
100 120 Frame number
140
160
180
Figure 5: S1 Walking 1 activity recognition with C = 20, 60, 100 and D = 5, 15, 25 models C = 20
C = 60
C = 100
Walk
Walk
Walk
T/C
T/C
T/C
Gest.
Truth
Truth
D = 5
Truth
0.8
Gest.
0.6 Gest.
0.4 Box
Box
Box
Jog
Jog
Jog
0.2 Walk Walk
T/C
T/C
Gest.
Box
Jog
Jog Gest.
Recognised
Box
Jog Walk
T/C
T/C
Box
Jog
Jog T/C
Gest.
Recognised
Box
Jog
Jog
Walk
T/C
Walk
T/C
Gest.
Box
JogJog
Gest.
Box
Jog
Recognised
T/C Gest.
References
Box Jog T/C
Gest.
Recognised
Box
Jog
Recognised
Walk T/C
Gest.
Box
Walk
Box
Walk
Walk
Walk
Gest.
Gest.
Recognised
Gest.
Box
T/C
T/C
Truth
Jog
Truth
Box
Truth
Truth
Gest.
Recognised
Walk
Walk
D = 25
T/C
Truth
D = 15
Truth
Walk
informative level of the label. Further work could focus on higher-level filtering and interpretation of the low level symbolic information in order to enhance the output. The results of the tracking and behaviour analysis will be further integrated and studied under a common framework.
Gest. Box Jog
Walk
T/C
Gest.
Recognised
Box
Jog
Walk
T/C
Gest.
Recognised
Box
Jog
Figure 6: Confusion matrices for number of clusters C = 20, 60, 100 and sequence length D = 5, 15, 25 be extended or replaced with new dynamic data or with new labels. We have also shown that shorter APs are more informative then longer APs. Although the method works well for periodic activities, it is not restricted to these. In this paper the BFV consists of 17 parameters, the body joint angles. Replicating the model at different body levels, for BFVs with a subset of the joint angles only (e.g., the leg parameters) would result in more accurate labelling of these body parts, but this requires the selection of the most
[1] A. E. Elgammal and L. S. Davis, “Probabilistic framework for segmenting people under occlusion,” in Proc. International Conference on Computer Vision, vol. 2, pp. 145–152, 2001. [2] J. Kang, I. Cohen, and G. G. Medioni, “Persistent objects tracking across multiple non overlapping cameras,” in Proc. of IEEE Workshop on Motion and Video Computing, pp. 112–119, 2005. [3] I. Haritaoglu, D. Harwood, and L. S. Davis, “W4: Real-time surveillance of people and their activities,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. vol.22, no. 8, pp. 809–830, 2002. [4] Q. Zhao, J. Kang, H. Tao, and W. Hua, “Part based human tracking in A multiple cues fusion framework,”
General: Walk (0.900619) Turn left (0.912236) Arm (left): Left arm forward (0.760678) Arm (right): Right arm backwards (0.746047)
General: Walk (0.961098) Turn left (0.966586) Arm (left): Left arm forward (0.834587) Arm (right): Right arm backwards (0.844667)
General: Walk (0.968151) Turn left (0.981202) Arm (left): Left arm forward (0.876618) Arm (right): Right arm backwards (0.887316)
General: Walk (0.964948) Turn left (0.979312) Arm (left): Left arm forward (0.876692) Arm (right): Right arm backwards (0.885811)
Leg (right):
Leg (right):
Leg (right):
Leg (right):
Right stride (back) (0.831612)
Frame 31 General: Turn left (0.808296) Walk (0.633265) Arm (left): Left arm forward (0.585413) Arm (right): Right arm backwards (0.60193) Leg (right):
Frame 34 General:
Turn left (0.714061)
Right stride (front) (0.861949)
Frame 37 General:
Turn left (0.592595)
Right stride (front) (0.922299)
Frame 40 General:
Turn left (0.560025)
Right stride (front) (0.605347)
Frame 43 General: Walk (0.920567) Turn left (0.937066) Arm (left): Left arm forward (0.651617) Arm (right): Right arm backwards (0.624149) Leg (right):
Right stride (back) (0.707901)
Frame 46 General: Walk (0.992063) Turn left (0.993712) Arm (left): Left arm backwards (0.647518) Arm (right): Right arm forward (0.634761) Leg (left): Left stride (back) (0.631654)
Frame 49 General: Walk (1) Turn left (1) Arm (left): Left arm backwards (0.747883) Arm (right): Right arm forward (0.735915) Leg (left): Left stride (back) (0.768904)
Frame 52 General: Walk (1) Turn left (1) Arm (left): Left arm backwards (0.869809) Arm (right): Right arm forward (0.871707) Leg (left): Left stride (back) (0.884314)
Right stride (front) (0.778208)
Frame 55
Frame 58
Frame 61
Frame 64
Figure 7: Tracking results for frames 31-64, with human model back projected to one camera view out of the three views. The images have overlaid the recovered labels, grouped into the hierarchical levels shown in grey. The probability of each label is in brackets; the labels are shown in green (with probability above 0.8) or in blue (with probability between 0.5 and 0.8) in Proc. International Conference on Pattern Recognition, pp. 450–455, 2006. [5] J. Deutscher and I. D. Reid, “Articulated body motion capture by stochastic search,” International Journal of Computer Vision, vol. 61, pp. 185–205, Feb. 2005. [6] A. O. Balan, L. Sigal, and M. J. Black, “A quantitative evaluation of video-based 3D person tracking,” in Proc. Int’l Workshop on Performance Evaluation of Tracking and Surveillance, pp. 349–356, 2005. [7] S. Kim, C.-B. Park, and S.-W. Lee, “Tracking 3D human body using particle filter in moving monocular camera,” in Proc. International Conference on Pattern Recognition, pp. 805–808, 2006. [8] L. Sigal, S. Bhatia, S. Roth, M. J. Black, and M. Isard, “Tracking loose-limbed people,” in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 421–428, 2004. [9] N. R. Howe, “Silhouette lookup for monocular 3D pose tracking,” Image and Vision Computing, vol. 25, pp. 331–341, Mar. 2007. [10] M. Brand and A. Hertzmann, “Style machines,” in SIGGRAPH, pp. 183–192, 2000.
[11] J. Assa, Y. Caspi, and D. Cohen-Or, “Action synopsis: pose selection and illustration,” in ACM Trans. on Graphics, vol. 24(3), pp. 667–676, 2005. [12] M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri, “Actions as space-time shapes,” in Proc. International Conference on Computer Vision, vol. 2, pp. 1395–1402, 2005. [13] J. K. Aggarwal and S. Park, “Human motion: Modeling and recognition of actions and interactions,” in Proc. 2nd Int’l Symp. on 3D Data Processing, Visualization and Transmission, pp. 640–647, 2004. [14] T. B. Moeslund, A. Hilton, and V. Kruger, “A survey of advances in vision-based human motion capture and analysis,” Computer Vision and Image Understanding, vol. 103, pp. 90–126, Nov. 2006. [15] L. Sigal and M. J. Black, “HumanEva: Synchronized video and motion capture dataset for evaluation of articulated human motion,” Tech. Rep. CS-06-08, Brown University, 2006. [16] Z. Husz, A. M. Wallace, and P. R. Green, “Hierarchical, model based tracking with particle filtering,” in Detection vs Tracking, BMVA symposium, 2006.