Author's personal copy
Pattern Recognition 41 (2008) 418 – 431 www.elsevier.com/locate/pr
Articulated motion reconstruction from feature points B. Li a,∗ , Q. Meng b , H. Holstein c a Department of Computing and Mathematics, Manchester Metropolitan University, Manchester, M1 5GD, UK b Department of Computer Science, Loughborough University, Loughborough, LE11 3TU, UK c Department of Computer Science, University of Wales, Aberystwyth, SY23 3DB, Wales, UK
Received 17 October 2006; received in revised form 25 March 2007; accepted 6 June 2007
Abstract A fundamental task of reconstructing non-rigid articulated motion from sequences of unstructured feature points is to solve the problem of feature correspondence and motion estimation. This problem is challenging in high-dimensional configuration spaces. In this paper, we propose a general model-based dynamic point matching algorithm to reconstruct freeform non-rigid articulated movements from data presented solely by sparse feature points. The algorithm integrates key-frame-based self-initialising hierarchial segmental matching with inter-frame tracking to achieve computation effectiveness and robustness in the presence of data noise. A dynamic scheme of motion verification, dynamic keyframe-shift identification and backward parent-segment correction, incorporating temporal coherency embedded in inter-frames, is employed to enhance the segment-based spatial matching. Such a spatial–temporal approach ultimately reduces the ambiguity of identification inherent in a single frame. Performance evaluation is provided by a series of empirical analyses using synthetic data. Testing on motion capture data for a common articulated motion, namely human motion, gave feature-point identification and matching without the need for manual intervention, in buffered real-time. These results demonstrate the proposed algorithm to be a candidate for feature-based real-time reconstruction tasks involving self-resuming tracking for articulated motion. 䉷 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Non-rigid articulated motion; Point pattern matching; Non-rigid pose estimation; Motion tracking and object recognition
1. Introduction Visual interpretation of non-rigid articulated motion has lately seen somewhat of a renaissance in computer vision and pattern recognition. The motivation for directing existing motion analysis of rigid objects towards non-rigid articulated objects [1,2], especially human motion [3–5], is driven by potential applications such as human–computer interaction, surveillance systems, entertainment and medical studies. A large body of research, dedicated to the task of structure and motion analysis, utilises feature-based methods regardless of parametrisation by points, lines, curves or surfaces. Among these, concise feature-point representation, advantageously abstracting the underlying movement, is usually used as an
∗ Corresponding author. Tel.: +44 161 247 3598; fax: +44 161 247 1483.
E-mail addresses:
[email protected] (B. Li),
[email protected] (Q. Meng),
[email protected] (H. Holstein).
essential or intermediate correspondence towards the endproduct of motion and structure recovery [6–8]. In the context of vision cues via feature-point representation, the spatio–temporal information is notably reduced to a sequence of unidentified points moving over time. To determine the subject’s structure and therefore its underlying skeletal-style movements for the purpose of high-level recognition, two fundamental problems of feature-point tracking and identification need to be solved. Tracking feature points in successive frames has been investigated extensively in the literatures [9–12]. However, the identities of the subject feature points are not obtainable from inter-frame tracking alone. Feature-point identification requires the determination of which point in an observed data frame corresponds to which point in its model, thus allowing recovery of structure. The task addresses the difficult problem of automatic model matching and identification, crucial at the start of tracking or on resumption from tracking loss. Currently, most tracking approaches simplify the problem to incremental pose estimation, relying
0031-3203/$30.00 䉷 2007 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2007.06.002
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
on manual model fitting at the start of tracking, or on an assumption of initial pose similarity and alignment to the model, or on pre-knowledge of a specific motion from which to infer an initial pose [5,13]. In this sense, the general recovery of nonrigid articulated motion solely from feature points still remains an open problem. There is a relative dearth of algorithmic selfinitialisation for articulated motion reconstruction from only a collection of sparse feature points. Motivated by these observations, we present a dynamic segment-based hierarchical point matching (DSHPM) algorithm to address self-initialising articulated motion reconstruction from sparse feature points. The articulated motion we are considering describes general segmental jointed freeform movement. The motion of each segment can be considered as rigid or nearly rigid, but the motion of the object as a whole is high-dimensionally non-rigid. In our work, the articulated model of an observed subject is a priori known, suggesting a model-based approach. As a general solution to the problem, the algorithm only assumes availability of feature-point motion data, such as obtained in our experiments via a marker-based motion capture system. We do not make the usual simplifying assumptions of model-pose similarity or restricted motion class for tracking initialisation, nor do we require absence of data noise. The algorithm aims to establish one-to-one matches between the model point-set and its freeform motion data to reconstruct the underlying articulated movements in buffered real-time. 2. Related work The problem of automatically identifying feature points to retrieve underlying articulated movement can be inherently difficult for a number of reasons: (1) the possibility of globally high dimensionality to depict the articulated structure; (2) relaxation of segment rigidity to allow for limited distortion; (3) data corruption due to missing (occluded) and extra (via the process of feature extraction) data; (4) unrestricted and arbitrary poses in freeform movements; (5) requirements of self-initialising tracking and identification; and (6) computational cost compatible with real-time. While few works have attempted to address all these issues in the context of sparse feature-point representation (an early exploratory paper was published in Ref. [14]), many researches have attacked a variety of aspects of the problem. In this section, we review techniques related to the problem from two categories: point pattern matching (PPM) and articulated motion tracking. PPM is a fundamental problem for object recognition, motion estimation and image registration in a wide variety of circumstances [15,16]. Many of the approaches have focused on rigid, affine or projective correspondence, using techniques such as graphs, interpretation trees [17], Hausdorff distance [18], geometric alignment and hashing [19]. In these cases, the developed techniques are based on geometric invariance or constraint satisfaction in affine transformations, yielding approximate mapping between objects and models [20]. However, these methods cannot be easily extended to the high-configuration dimensionality of a complex articulated motion.
419
For modelling non-rigidity, elastic model [2,21], weightedgraph matching [22] and thin-plate spline approaches [23] have been developed to formulate densely distributed points into high-level structural presentations of lines, curves or surfaces. However, the necessary spatial data continuity is not available in the case of sparse points representing skeletal structures. Piecewise approaches [24,25] are probably the most appropriate for segmental data. In our case, a set of piecewise affine transformations with allowable distortion relaxation are sought for marching to an articulated segment hierarchy under kinematic constraints. A second category of literature deals with the tracking of a particular type of articulated motion: human motion. Existing algorithms commonly are model-based to reconstruct precise poses from video images. The main challenge is to track a large number of degrees of freedom in high-dimensional freeform movements in the presence of image noise. To improve the reliability and efficiency of motion tracking, a spatial hierarchical search, using certain heuristics such as colour or appearance consistency, has proved successful [26,27]. However, the spatial hierarchy and matching heuristic may not be applicable in individual frames due to self-occlusion and image noise. In that case, spatio–temporal approaches have been shown advantageous in recent researches. Sigal et al. [28] introduced a “loose-limbed model” to emphasise motion coherency in tracking. Lan and Huttenlocher [29] developed a “unified spatio–temporal” model exploring both spatial hierarchy and temporal coherency in articulated tracking from silhouette data. Spatio–temporal methods have enabled robust limb tracking in multi-target tracking [30], outdoor scene analysis [31] and 3D reconstruction of human motion [32]. Our work benefits from the spatio–temporal concept. However, methodologies based on the rich information of images cannot be adapted to our problem domain of motion reconstruction from concise featurepoint representation. Marker-based motion capture systems exemplify pointfeature trackers [33]. Coloured markers, active markers, or a set of specially designed marker patterns have been used to code the identification information in some systems. Such approaches side step the hard problem of marker identification, but at the expense of losing application generality. The generic PPM problem in articulated motion is exemplified by a state-of-the-art optical MoCap system, e.g. Vicon [34], without recourse to marker coding. However, auto-identification may fail for complex motion. MoCap data normally need timeconsuming manual post-processing before they can be used for actual applications. Our previous baseline study [35] developed a segment-based articulated point matching algorithm for identifying an arbitrary pose of an articulated subject with sparse point features from a single-frame data. The algorithm provided a selfinitialisation phase of pose estimation, crucial at the beginning or on resumption of tracking. It utilised an iterative “coarseto-fine” matching scheme, benefiting from the well-known iterative closest point (ICP) algorithm [36], to establish a set of relaxed affine segmental correspondences between a model point-set and an observed data set taken from one frame of ar-
Author's personal copy
420
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
ticulated motion. However, we argued the possibility of more robust motion reconstruction by combining with information from inter-frame tracking, that will eventually reduce the uncertainty inherent in the matching problem for single-frame data [37]. Pursuing the cross-fertilisation of the research and existing techniques, we extend our previous study on single-frame articulated pose identification [35,37] into a dynamic context. We propose a DSHPM algorithm, targeting the reconstruction of articulated movement from motion sequences. The DSHPM algorithm integrates inter-frame tracking and spatial hierarchical matching to achieve the effectiveness of articulated PPM to buffered real-time. The idea of segment-based articulated matching, as computational substrate to explore the spatial hierarchy [35,37], is enhanced by exploiting motion coherency embedded in inter-frames that ultimately reduces the ambiguity of identification in the presence of data noise.
tracking and identity propagation pre–tracking & pre–segmentation
tracking & recruitment
an articulated motion sequence presented by fearture points ...... key–frame range
key–frame CT-based iterative segmental matching Y
success?
dynamic key–frame–shift identification & recruitment
motion verification & recruitment
success? Y
N
N
failed 2 times? Y N
N
Y
has parent?
3. Framework of the model-based DSHPM algorithm abandon the segment
The generic task under consideration arose from the need to identify feature-point data to reconstruct underlying skeletal structure of freeform articulated motion. We assume the data capture rate is sufficiently high, as demanded in most real-world applications. This allows the obtaining of feature-point trajectories in successive frames. However, the identities of feature points (or trajectories) are not known. 3.1. The articulated model and motion data The subject to be tracked is pre-modelled. A subject model comprises S segments with complete feature points. Each segment Ps = {ps,i |i = 1, . . . , Ms } has Ms identified feature points ps,i . The feature points are sufficient in number and distribution to indicate the orientation and segment structure with demanded details. Segment non-rigidity is allowed within a threshold of segmental distortion ratio s . Articulation is indicated through join-point commonality between two segments, suggesting a segment-based hierarchy. To keep the algorithm general, each segment undergoes independent motion constrained only by joint points. We do not impose motion constraints such as feasible biological poses for a specific subject type. The observed motion data of the subject is represented by a sequence of point-sets at each time frame t: Qt = {qjt |j = 1, . . . , N t }, where the N t data points qjt could be corrupted by missing data due to occlusion and extra noise data arising from the process of feature extraction. 3.2. Outline of the DSHPM To identify massive data within a complex motion sequence, frame-by-frame model fitting would be computationally expensive and unnecessary. Ideally, initial entire model fitting need only be attempted at some key-frames, in particular, at the start of tracking or on the resumption from tracking failure. Subsequent identification of an individual feature point
backward parent–segment correction
next segment (if any) depth–first along hierarchical tree dynamic segment–based hierarchical identification in a key–frame range
Fig. 1. Framework of the dynamic segment-based hierarchical point matching (DSHPM) algorithm.
can be achieved by tracking over its trajectory. In the case of broken tracks, re-identification costs are largely reduced by reference to the known points whose identities are carried forward. The framework of the proposed DSHPM algorithm is shown in Fig. 1. As computation substrate for initial model fitting, it employs a hierarchial segmental mapping supported by candidate-table (CT) optimisation at a key-frame (Section 4.2). To reduce the inherent uncertainty of segmental matching in a single key-frame, a dynamic scheme (Section 4.3) incorporating temporal coherency in a key-frame range is explored through inter-frame tracking. This includes three phases: motion verification, dynamic key-frame-shift identification and backward parent-segment correction, as shown in Fig. 1. Under CT-based iterative matching, the algorithm first verifies that a proposed segmental match is consistent with an affine transformation over a period of movement subject to a relaxed geometric invariance defined by a segmental distortion ratio s . We name this process motion verification (Section 4.3.1). If a segment cannot be identified or the segment identification cannot be proved correct by motion verification, reflecting poor segment data in the current key-frame, the algorithm shifts the key-frame forwards a certain time period to attempt re-identification. This process is denoted dynamic key-frame-shift identification (Section 4.3.2). The final phase, backward parent-segment correction, aims to correct any wrongly identified parent-segment which could cause subsequent unsuccessful matches of child-segments (Section 4.3.3).
Author's personal copy
421
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
In the dynamic process, a temporal key-frame shift is always accompanied by a recruitment procedure that forward propagates already obtained identities and recruits any newly appearing matches in order to maintain spatial hierarchy.
articulated movements and may therefore belong to a same segment, we store this information in a segmentation matrix Seg and set Seg(i,j ) = 1; otherwise we set Seg(i,j ) = 0 for an extra-segment pair.
4. The DSHPM algorithm
4.2. CT-based iterative segmental identification
Identification is carried out hierarchically segment by segment in a chosen key-frame containing e.g. over 90% of the model points (Section 4.2), or in a key-frame range when necessary, taking advantage of a dynamic scheme (Section 4.3). In order to make temporal coherence of motion cues exploitable and reduce the search space, feature-point pre-tracking and presegmentation are carried out prior to segmental identification (Section 4.1).
Articulated motion maintains a relaxed geometric invariance in near-rigid segments. Matching at segment level is therefore preferable to brute global point-to-point searching. Initially neither correspondence nor motion transformation are known for any segment or point. To identify a segment Ps in an articulated structure at tracking start, we adapt the basic idea of CT for pose identification developed in our pervious work [35,37], to the new context of motion sequence. We enhance the static CT-based iterative segmental matching by exploring motion coherence embedded in inter-frames. Briefly, a CT is created in two stages: (1) CT generation and optimisation augmented by pre-segmentation information (Section 4.1), and (2) CT-based iterative matching (Section 4.1).
4.1. Pre-tracking and pre-segmentation Feature-point data are tracked in a time period before identification. We denote this step as pre-tracking in Fig. 1. This process allows propagating inter-frame correspondences of feature points in a key-frame backwards and forwards along stacked pre-tracked trajectories. It not only makes use of motion coherence for efficient segment retrieval, but also makes the keyframe-based identification feasible. The pre-tracked trajectories exhibit relative motion cues of individual points. To reduce the search space of a segment, a pre-segmentation process is carried out prior to segmental identification, as shown in Fig. 1. We group unidentified points that maintain relatively constant distances during articulated movements, as candidates for intra-segmental membership. The pre-segmentation is subjected to criteria that depend on t between each pair of observed data the Euclidean distance Di,j (qi , qj ) at frames t =K+n, n=0, 1, . . . , 10, starting from the key-frame K and proceeding in intervals , where denotes the motion relaxation interval, that is, the number of frames during which motion relaxation takes place, reflecting noticeable changes in pose. We determine intra-segmental point-pair candidature (qi , qj ) using a two-stage criterion with relaxation: t Di,j (1) < 1 + max s max Ds , s
s
K+n K+n maxn (Di,j ) − minn (Di,j ) K+n avgn (Di,j )
< max s , s
(2)
where the segment distortion ratio s is determined by relative variation of edge length, among segmental point-pairs, reflecting allowed “non-rigidity” of a segment in motion. t should Criterion (1) indicates that the point-pair distance Di,j be less than the maximum intra-segmental point-pair distance Ds with maximum distortion relaxation. Criterion (2) requires that the ratio between the extremal distance difference and the average distance of the point-pair, over the intervals, should be less than the maximum distortion ratio allowed in any segment. If both criteria are satisfied, indicating that point-pair (qi , qj ) maintains a consistent distance with allowed relaxation in
4.2.1. CT generation As explained in Refs. [35,37], the CT of segment Ps is determined by intra-segmental distance similarity, here augmented with heuristic rigidity cues available from pre-segmentation. To define the column ordering of a CT for a segment in which no point has been identified, arbitrarily choose a “pivot” reference point ppivot , and order the remaining model points pi by nons decreasing distance Dpivot ,i from the pivot, giving a model s pivot sequence for the segment. To match the model pivot sequence, a sequential search is applied, with the possibility of rapid rejection of false candidates. Thus, from the unidentified data at key-frame K, arbiK trarily choose an assumed pivot match qapivot of ppivot , and K calculate its distance Dapivot
s
s ,j
s
to all other unidentified points
qjK . To exclude large outliers of the segment based on the chosen pivot, a pivot-centred bounding box, relaxed by distortion ratio s , is applied [35]. Then in the bounding subspace, the algorithm seeks match candidates of each model point based on K distance similarity with reference to the assumed pivot qapivot , s satisfying the distortion tolerance, K |Seg |Dpivot ,i − Dapivot s (apivot s ,j
s ,j )
Dpivot ,i s
=1 |
< s ,
(3)
in which the candidate selection is restricted by the presegmentation point-pair rigidity criterion Seg(apivot ,j ) = 1. s We list such selected candidates in a table column as possible matches. The procedure is repeated for every element along the model pivotal sequence, giving rise to an ordered matching sequence of columns that define the CT for the assumed pivot K match (ppivot , qapivot ). s s Taking each unidentified point in turn as an assumed pivot match, we generate a set of CTs for that model pivot choice.
Author's personal copy
422
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
Heuristically, the CT constructed with the correct pivot match, if present in the data, should include more candidates than other CTs. To economise the iterative search at next stage (Section 4.2.2), CT prioritising by CT-culling, CT-ranking and candidate ordering are applied to reduce the search space in which the correct solution is likely to be found, as discussed in Ref. [35]. The use of CTs makes the assumption of small motion or pose similarity, used e.g. in the ICP [36], unnecessary. When a join point has already been identified during its parent-segment identification, then this point is chosen as the unique pivot. In this case, only one CT is generated, resulting in a striking reduction of search space. 4.2.2. Iterative segmental matching To detect the presence or absence of a one-to-one intrasegmental feature-point correspondences in a CT of the reduced set of prioritised CTs, an iterative matching procedure is employed to seek the best transformation that interprets segmental movement under distortion relaxation. We take first candidates in the top row of a CT to provide an assumed initial match QK s of model segment Ps . An affine transformation [Rs , Ts ] is determined under this correspondence by a leastsquare SVD method [38]. If this transformation maps model points into a rough alignment with their assumed matches, so that the average mapping error satisfies the desired matching quality bounded by the segmental distortion ratio, e¯(Ps →QK ) |[Rs ,Ts ] < s , s
(4)
then the assumed segmental match (Ps → QK s ) is taken as correct. Otherwise, there must be pseudo pairs in the matching assumption which need to be removed by coarse-to-fine errordriven iteration. Pseudo pairs exaggerate individual matching errors at wrong matches. Based on this cue, we remove the worst match by replacing it with its next candidate in the CT, if it exists; otherwise we omit its match from the currently assumed correspondence in the CT. If this CT becomes exhausted before the best segment match is found, then a new CT would be interrogated [37]. To qualify as a whole segmental match, the transformation under the assumed correspondence should also guarantee the desired matching quantity fraction s , number(Ps → QK s )|[Rs ,Ts ] > s , Ms
(5)
where Ms is the number of feature points in model segment Ps . If the matching quantity criterion is not satisfied, the algorithm will attempt to find the remaining matches of the segment, which may have been dropped during iterations, or were excluded from the CT on grounds of limiting the search space via stringent criteria equations (1)–(3). Finding remaining matches is achieved by reassigning their nearest-neighbours in the data under current transformation [Rs , Ts ] (refer to segment recruitment shown in Fig. 3). If no such closest neighbour is found, we say the match of the point is lost. Iterative motion estimation and refinement alternately update the assumed matches until converging to a segmental match
Fig. 2. Motion verification.
(Ps → QK s ) in the correct CT, leading to satisfaction on both matching quality equation (4) and matching quantity equation (5). In the event of no CT providing an acceptable match, the segment match will be deemed not to exist in the current keyframe. In the case of segments with less than three matching pairs, the SVD-based motion estimation cannot be applied. Segmental identification becomes highly uncertain. We need to confirm such a segment in the hierarchical chain depending on whether its children or even grandchildren can be found (Section 4.4). 4.3. Dynamic identification Single-frame spatial pose data alone may have inherent uncertainty in determining the correct match from noisy data. In the dynamic context of a motion sequence, geometric coherence embedded in the temporal domain along pre-tracked trajectories is used to improve the reliability of identification from a key-frame. As shown in Fig. 1, after the CT-based iterative segmental matching (Section 4.2), a dynamic scheme of motion verification (Section 4.3.1), dynamic key-frame-shift identification (Section 4.3.2) and backward parent-segment correction (Section 4.3.3) is applied in a key-frame range to guarantee a most likely correct segment identification. 4.3.1. Motion verification In segment-based articulated motion, geometric invariance of segmental “rigidity” should be maintained over movements. The idea of motion verification is therefore to propagate the segmental feature-point identities along their pre-tracked trajectories, and to confirm an affine transformation under such correspondence still satisfying the matching quality criterion equation (4) within an allowed distortion relaxation. As summarised in Fig. 2, the segmental matching obtained in the keyframe should be confirmed via its “rigidity”, after a motion relaxation interval -frame-shift of the key-frame. When the key-frame segment match (Ps → QK s ) is confirmed, the algorithm will attempt to retrieve newly appearing points at the observed frame K + if the segment is incomplete. This is achieved by the segment recruitment procedure as shown in Fig. 3. If at the observed frame K + , more matches are found, reflecting good data quality, then the dynamic identification scheme favours a key-frame-shift to K + , to be described below.
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
423
This indicates that the segment could have been occluded, or have poor data quality even over a range of investigated frames. 4.4. Integrating temporal coherence with spatial hierarchy for articulated match
Fig. 3. Segment recruitment.
Fig. 4. Recursive Parent_segment_correction.
4.3.2. Dynamic key-frame-shift identification The quality of some frame data for a segment may be very poor, on account of excessive missing/extra data or distortion. In this case, CT-based segmental identification may fail. This will break off the hierarchical search and result in serious uncertainty for successive child-segment identification. For this reason, we do not confine the segment matching to the initially chosen key-frame, but rather carry out matching in a key-frame range. If the segment identification or verification fails, then the dynamic key-frame-shift module is used to re-identify the segment in up to two successive key-frame-shifts, as shown Fig. 1. In order to maintain spatial hierarchy, after the key-frameshift, the recruitment procedure in Fig. 3 is applied to all incompletely identified segments to encompass any previously missed matches and forward propagate the obtained segmental identities into the new key-frame. 4.3.3. Recursive parent-segment correction If two successive key-frame-shift processes still fail to identify or verify a segment Ps , this may imply a wrong or highly distorted joint pivot in use, derived from its parent-segment during hierarchical searching. In this case, the algorithm attempts a recursive backward parent-segment correction to check for the join point and even its parent-segments, as described in Fig. 4. If after a series of dynamic attempts, no parent join takes part in the failed identification, we ultimately abandon the segment.
Articulation at inter-segment joins is represented in a tree hierarchy. Consistent matching of articulated segments is carried out with respect to this tree. Such spatial hierarchy organises segment identification in a parent–child ordering, thereby carrying forward identified join points in a parent to its children. We assume that one of the segments of the articulated structure contains more points and has more segments linked to it than most other segments. We treat such a segment as root, seeking to identify it first. After the root has been located, searching proceeds depth-first to child-segments along hierarchical chains, taking advantage of available joints which have been located during parent-segment identification. This linkage through join points considerably increases the reliability and efficiency of child-segment identification. In the case of missing joint data on a parent segment, we recover a virtual joint if at least three identified points are obtained in the parent. When a parent has several children, searching prioritises to the child with the most model points, as its identification incurs the least uncertainty from missing, extra or distorted data, and leads to the greatest subsequent search space reduction. In the case of broken search chains in the hierarchy due to a failed segment identification or missing join point, identification will proceed to other segments on other chains first and leave any remaining child-segments on broken chains to be identified last under conditions of a much reduced search space. The segment-based hierarchical search operates dynamically in a key-frame range, rather than being confined to a single static data frame as in Refs. [35,37]. This lends robustness to segment identification, as data maybe poor in one frame while good in another. When the hierarchical chain is broken in a chosen frame, the algorithm could shift to a new frame to carry on the search. Most existing feature-point identifications will be carried forwards along pre-tracked trajectories to propagate spatial continuity into the new frame. An obtained segment match can be confirmed by geometry invariance presented in the temporal domain of a motion relaxation interval (Fig. 2). A failed child-segment identification caused by a wrongly inherited pivot from its parent can be corrected by a recursive procedure (Fig. 4). Meanwhile, the dynamic scheme allows an efficient recruitment of reappearing segment points by reference to the known points whose identities are carried forward (Fig. 3). Our experimental results confirm that temporal coherency integrated with spatial hierarchical cohesion enhances identification of complex articulated motion in the presence of data corruption (Section 5). 4.5. Identification with tracking for registering a whole motion sequence After segment-based dynamic hierarchical matching in a keyframe range, the identity of each feature point can be propagated
Author's personal copy
424
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
along its trajectory by inter-frame tracking. The algorithm continues to track identified points forwards throughout the whole motion sequence until missing data are encountered, causing broken trajectories. For missing data, the algorithm attempts to identify reappearing points or even segments and restart their tracking. This is much easier by reference to the already identified points than at the initial stage of model fitting, because the motion transformation can be obtained from partially identified segment matches. In the case of an entire newly appearing segment, identification complexity is evidently much reduced in the presence of previously established correspondences in the articulation. 5. Experiments The algorithm has been implemented in Matlab. We tested it on articulated models, such as human and robot manipulators in various low densities and distributions of feature points. Human motion representing a typical articulated motion with segments of only near-rigidity makes the identification task more difficult than in the case of robot manipulator motion with rigid segments. To reflect this challenge, we report in this section experimental results on real-world human motion capture and its overlays with synthetic noise for performance analysis of the algorithm. In our experiments, all 3D model data and human motion data were acquired via a marker-based optical MoCap Vicon 512 system [34]. The measurement accuracy of this system is to the level of a few millimetres in a control volume spanning metres in linear extent. We attached markers as extrinsic feature points on a subject at key sites, indicating segmental structure with required detail. Marker attachment to tight clothing or bony landmarks nevertheless introduced inevitable segmental non-rigidity due to underlying soft body tissues. The sampling rate of human MoCap was 60 frame per second (fps) in our experiments. 5.1. Human motion reconstruction from MoCap data A number of freeform movements captured from various subjects in various point distributions were investigated. The subject model is first generated off-line using one complete frame of feature-point data captured in a clear pose. We manually identified the data in a 3D-interactive display and grouped them into segments consistent with the underlying articulated anatomy. This produced a “stick-figure” skeleton model of the subject, as shown in the first of the figure sequences in Fig. 6(a) and (b). Having attached markers to the subject and defined its model, we proceeded with the capture of the subject’s freeform motion using the Vicon MoCap 512 system. 5.1.1. Parameter setting The segmental distortion ratio s in Eqs. (1)–(4) and matching quantity s in Eq. (5) were pre-defined according to the segment rigidity and the quality of MoCap data. Precise values of the parameters are not required a priori, but algorithmic performance will be compromised by very inappropriate choices.
Fig. 5. Capture static pose data for subject model generation.
To provide experimental values of s , we analysed a number of dynamic trials. We found that segmental distortion differs with different body parts and motion intensity. Thigh segments may give rise to large distortion with s ≈ 0.2. A value of s ≈ 0.05 was found to be adequate for indicating the rigidity of the head, and an average s ≈ 0.1 for other body parts. We used one set of approximate distortion ratios s =.05..2 in all human motion experiments (Fig. 5). For a rigid segment, a small value of s guarantees a precise matching quality and provides good ability of rejecting outliers. Therefore, we could reduce the matching quantity requirement s to gain more tolerance on missing data. For a deformable segment, a high s value allows increased distortion, but at the cost of increased candidate search space and possible low-matching quality. For compensation, we have to raise the matching size requirement s with possible compromised handling of missing data. Based on the quality of MoCap data and the rigidity of individual segments, we set the matching quantity s = .80..90. To reflect significant pose changes, we chose a motion relaxation interval = 15 frames, corresponding to 0.25 s at the MoCap rate of 60 fps. 5.1.2. Reconstruction results Illustrative results of identified MoCap sequences from representative full-body models of 27, 34 and 49 feature points in 15 segments are given in Table 1 and in Fig. 6. In Fig. 6, identified feature points are linked intra-segmentally. As shown in Fig. 6, there is no assumption of pose similarity between the model (the frame first in Fig. 6(a), (b)) and their motion sequences, initially or during the movements. The captured motion sequences were subject to inevitable intermittent missing
Author's personal copy
425
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431 Table 1 Identification examples of human motion (MoCap rate at 60 fps) Activity
Sequence length in frames (seconds)
Number of trajectories
Identification rate (%)
Efficiency (identified frames per second)
Protocol 1: 27 feature points Walking Running Freeform movement
600 (10) 600 (10) 1200 (20)
32 40 93
96 95 94
300 270 220
Protocol 2: 34 feature points Walking Running Freeform movement
600 (10) 600 (10) 1200 (20)
45 54 106
98 97 94
380 320 250
Protocol 3: 49 feature points Walking Running Freeform movement
600 (10) 600 (10) 1200 (20)
56 65 128
96 94 91
280 220 130
points, extra noise points and segmental distortion in complex intensive motion. We observe that the proposed DSHPM algorithm is capable of reconstructing an articulated motion represented by sparse or moderately dense feature points in the presence of data noise. Even when some key points, such as join points, or segments have been lost, the algorithm can still carry on the identification process by taking advantage of the proposed dynamic scheme. This algorithm has been successfully used to identify a number of MoCap trials in a commercial project to produce the game “Dance: UK”.1 Extracts from an identified dance sequence are shown in Fig. 7. 5.1.3. Performance analysis from the MoCap data To demonstrate the performance of the DSHPM algorithm, Table 1 gives results of human motion identification obtained by applying the DSHPM algorithm on real-world MoCap trials. The average missing and extra data in the real-world MoCap examples is about 10–15%. Some general types of activities, listed in the first column, were tested under three types of marker protocol: 27, 34 and denser 49 feature points. The freeform movement includes walking running, jumping, bending and dance. The results shown in each row of Table 1 were averaged for a number of trials from different subjects preforming similar movements and with the same marker protocol. The average length of each type of activity, measured in frames as shown in column 2, gives an indication of absolute motion period at the 60 fps MoCap rate. The sequence length by itself does not always indicate the difficulty of identification. In the most favourable case of no tracking interruption, each feature point need be identified only once, allowing its identity to be propagated along its trajectory with minimum re-identification cost. However, a trial with occlusions in complex movements will lead to increased computational cost, as each reappearing feature point is subjected to identification after tracking loss. 1 The “Dance: UK” was developed in collaboration with Broadsword Interactive Ltd [38,39]. It was released during Christmas 2004.
To indicate identification workload due to lost tracking, column 3 gives the number of trajectories, counting interruptions. These are generally higher than the indicated number of feature points, consistent with increasing identification difficulty. Activities (first column) for each marker protocol are ordered by increasing movement complexity, accompanied by increased identification difficulty due to more missing data and extra noise data. This is reflected by the decreasing identification rate in column 4. The identification rate is defined as the percentage of the number of correctly identified trajectories in relation to the total number of trajectories encountered. The high trajectory-based rate emphasises correct identification obtained via segment-based articulated matching, illustrating the effective nature of the algorithm, rather than identification inherited only from inter-frame tracking, thus illustrating the effective nature of the algorithm. We observed that the identification rate in Table 1 is in excess of 90% in all motion types, whether with sparser or denser feature points, even for large accelerative movements with big segmental distortion, such as in jumping and dancing, and for complex movements characterised by large numbers of broken trajectories due to occlusion. Reconstruction efficiency of the algorithm depends on the complexity of the articulated model, but critically also on the motion conditions: the level of segmental distortion associated with movement intensity, the frequency and amount of missing/extra data associated with motion complexity. We indicated an empirical reconstruction efficiency via “identified frames per second” in Table 1. This indicator is defined as the length of a trial (measured in frames) divided by the computational time (measured in seconds) when the DSHPM identification was executed on a Compaq Pentium IV with 512 MB RAM in Matlab code. We observe that for common activities of walking and running, the identification efficiency in the type 2 marker protocol (34 feature points) is higher than for types 1 and 3 (27 and 49 feature points, respectively). Type 2 is a compromise between having too few feature points (type 1) to allow uninterrupted hierarchical searching in the case of missing data, and the denser data sets (type 3), with possible identification con-
Author's personal copy
426
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
model
Frame=50
Frame=200
Frame=350
1500 1500
1500
1000 1500
1000
1000
500 1000
500
0
500 500 0 200 0 –200 –400
0 200
–1000 –500
0 –200 –400
0
Frame=500
–1000 –200 0 –500 200 400 –600
Frame=650
0 800 600 400 200 0
0 200
400 600
Frame=800
0 –200 –400
0 –200
200
Frame=950
1500
1500
1500
1000
1500 1000
500 1000
500
1000 0 0
500
500 0 200
0 200 400 600
0 –200 –400
–1000 500
–400 0 200 0
–500
500 1000
model
1000
0
0 –200
0 600 400
0
200 0
Frame=120
Frame=60
1500
200 600 400
600 400 200
Frame=180
1600
1500
1400
1500
1200
1000
1000
1000
1000
800 500
500
500
600 400
0 1000
500
500
0
0
0 500
0
0
0 0
200 1000 0
500
500 1000 500
Frame=240
Frame=300
500
0
0
800 600 400 200
0
Frame=360
Frame=420
1500
1500 1500 1500 1000
1000
500 0 0
1000
1000
500
1000
500 800 600 400 0 200 1000 0
500
500 0 500
0
0
500
0 200
400 600
800
0 200 400 0 600 1000
0 500
500 0
Fig. 6. Reconstructed human freeform movements: subject models of 15 segments followed by 7 sampled frames from their identified motion sequences: (a) human motion represented by 34 feature points; (b) human motion represented by 49 feature points.
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
Fig. 7. Dance trial reconstruction in the game project “Dance: UK”.
427
Author's personal copy
428
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
1 Identification rate
Identification rate
1 0.9 0.8 0.7
30 points 50 points
0.6 0.5
0.9 0.8 0.7
30 points 50 points
0.6 0.5
0
0.09
0.18
0.27
additional distortion level
0
0.09
0.18
0.27
additional distortion level
Fig. 8. Comparison of the static approach [35] with the DSHPM approach for motion reconstruction with additional synthetic distortion: (a) static identification; (b) dynamic identification.
fusion and general increased computation cost. For each type of marker protocol, identification efficiency decreases with increasing activity complexity involving more broken trajectories and data noise. In all cases, the identification efficiency exceeded the MoCap rate 60 fps by at least two times, making identification time competitive with real motion time. This suggests reconstruction for an on-line tracker realised in buffered real-time.
the denser point-set, and to greater loss of identification and tracking difficulty. However, comparing Fig. 8(a) and (b), the DSHPM algorithm achieves better identification rates than the static method, at a given distortion level. This is more evident with increasing distortion, as is to be expected from the DSHPM robustness gained by exploiting motion coherence embedded in inter-frames to survive spatial distorted data. The advantage of using DSHPH is more obvious for the difficult situation of the denser set.
5.2. Evaluation based on synthetic distortion of real data To evaluate the robustness and efficiency of the DSHPM algorithm, we used two MoCap walking sequences. Each of them has 600 frames corresponding to 10 s at a MoCap rate 60 fps. They were captured from the same subject for comparability, in the sparse case of 30 points and the denser case of 50 points, respectively. Both sequences are denoted as “ideal”, having no missing or extra data, and minimal distortion by marker attachment to the subject at tightly clothed key sites. Their identification rate by the DSHPM is 100%. 5.2.1. Identification of distorted motion data compared to dynamic and static schemes In the first series of experiments, we compared identification effectiveness by the proposed dynamic strategy with that of a static identification scheme [35], under increasing motion distortion. In the latter, identification is carried out in isolation at each frame, without considering any inter-frame temporal coherence. To simulate the effect of distortion under variable motion intensity, we augmented the “ideal” motion data with synthetic noise over its natural distortion, as follows. Taking a pre-identified “ideal” √ walking sequence, we added Gaussian noise N (0, 0.5l¯s )/ 6 to the x, y and z coordinates of each point in sequences of over 600 frames, the standard deviation being parameterised by a dimensionless distortion level and an average segmental length l¯s . The average identification rates (fraction of correctly identified points) over 500 trials, versus the increasing distortion level of both sparser 30 and denser 50 feature-point walking trials, using either the static or dynamic identification scheme, are given in Fig. 8(a) and (b). We observe that increased distortion leads to more potential for confusion among neighbouring data points, especially for
5.2.2. Identification of corrupted data with missing/extra data The second series of experiments studied the ability of the DSHPM algorithm to identify “ideal” walking sequences subjected to increasing missing or extra data. To obtain test motion sequences with missing data, we removed feature-point data randomly and evenly among segments, with gaps continuing for 1–60 frames and average length L¯ corrupt = 30 frames. To generate an extra trajectory in a volume encompassing the observed data, we randomly inserted two points, one in each of two frames with 1–60 frames apart, and linearly interpolated the trajectory in-betweens. The average length of an extra trajectory is therefore L¯ corrupt = 30 frames. The fraction of such corrupted (missing or extra) data is defined as (L¯ corrupt /L) × (Ncorrupt /N ), in which Ncorrupt denotes the number of missing or extra trajectories generated, where L is the frame length of the sequence and N is the number of model feature points. Average identification rates of 500 trials at different data corruption levels are shown in Fig. 9. Fractions of missing or extra data added to the “ideal” walking sequences are indicated by the bottom horizontal axis. The corresponding numbers of broken trajectories encountered at each missing or extra noise level, for 30 (and 50) point trials, are given along the top axis. Comparing the left and right of the zero-line, the latter line indicating identification of the “ideal” sequences with original 30 or 50 trajectories, we observe that the algorithm demonstrates good robustness in rejecting large numbers of outliers, but rapidly fails to survive the inherent difficulty of increasing missing data. It is also evident that the denser set gains better tolerance on missing data than the depleted sparser point-set. However, more identification loss happens to the denser set for extra data.
Author's personal copy
429
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
510 (850)
570 (950)
0.8
0.9
630 (1050)
450 (750)
0.7
390 (650)
330 (550)
270 (450)
210 (350)
150 (250)
90 (150)
30 (50)
66 (110)
102 (170)
138 (230)
174 (290)
Num. of trajectories for sparse (30) and dense (50) point sets
1.1
Identification rate
1 0.9 0.8 0.7
30 points 50 points
0.6
1.0
0.6
0.5
0.4
0.3
0.2
0.1
0
-0.06
-0.12
-0.18
-0.24
0.5
Fraction of missing (-) and extra (+) data
Fig. 9. DSHPM identification with synthetic missing and extra data added in the “ideal” walking movements.
210 (350)
270 (450)
330 (550)
390 (650)
450 (750)
510 (850)
570 (950)
630 (1050)
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
150 (250)
90 (150)
30 (50)
66 (110)
102 (170)
138 (230)
174 (290)
Num. of trajectories for sparse (30) and dense (50) point sets
1000
SVD-count
30 points 50 points 100
0.2
0.1
0
-0.06
-0.12
-0.18
-0.24
10
Fraction of missing (-) and extra (+) data
Fig. 10. SVD-count versus missing and extra data.
5.2.3. Complexity During dynamic hierarchical matching, the identification step is computationally more intensive than inter-frame tracking. Motion transformation [Rs , Ts ] calculated by the SVD [38] under an assumed segmental correspondence is the most time consuming step, and is invoked in most essential modules, e.g. CT-based iterative segment match, motion verification and recruitment. Therefore, in the last series of experiments, we attempted to measure an empirical complexity via the total number of SVD invocations, denoted by SVD-count.
We undertook such a complexity analysis relating to the two experiments of Section 5.2.2 above. SVD-counts versus missing/extra data in walking sequences, for both cases of sparser and denser point sets, are monitored in Fig. 10. In both cases, we observe that SVD-counts grow steadily with increasing extra data in an approximate log-linear manner. Most SVD-counts are spent at the initial key-frame identification stage. Ideally, when all segments are identified without missing data, any of outliers need only to be tracked without further identification cost. On the left side of zero-line in Fig. 10, SVD load increases rapidly with increasing missing data. However, the growth tendency is
Author's personal copy
430
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
restrained toward higher fractions of lost data. This is because on the one hand, incomplete data causes more identification and verification difficulties during initial segmental matching, and the recruitment function that invokes the SVD is required to encompass any newly appearing matches; on the other hand, missing data reduce the number of points to be identified and raise the possibility of segment abandon. Comparing the denser and sparser cases, all have the same number of segments, but the denser set will lead to more populated CTs. This is likely to require greater numbers of match attempts, but the overall complexity is seen to grow only at some low power of the extra data measure.
6. Conclusion The proposed dynamic segment-based hierarchical point matching (DSHPM) algorithm addresses a general and currently open problem in pattern recognition: non-rigid articulated motion reconstruction from low-density feature points. The algorithm has a crucial self-initialisation phase of pose estimation, benefiting from our previous work [35,37]. In the context of a dynamic sequence, the DSHPM algorithm integrates a key-frame-based dynamic hierarchial matching with inter-frame tracking to achieve computation efficiency and robustness to data noise. The candidate table optimisation heuristics are improved by exploiting geometric coherency embedded in inter-frames. Segment-based articulated matching along a spatial hierarchy is significantly enhanced by a dynamic scheme, in the forms of motion-based verification, dynamic key-frame-shift identification and backward parent-segment correction. Performance analysis of the algorithm using synthetic data demonstrates the effectiveness of the dynamic scheme that ultimately determines the robustness of articulated motion reconstruction and reduces the uncertainty inherent in the matching problem for single frame. We provided illustrative experimental results of human motion reconstruction using 3D real-world MoCap data. Identification rates for most common freeform movements have achieved 90% or higher without requiring manual intervention to aid the identification. Identification efficiency proceeded at over twice the common MoCap rate of 60 fps. This suggests the DSHPM algorithm has the potential for self-initialising point-feature tracking and identification of articulated movement in real-time applications.
Acknowledgements All model and motion data used in our experiments were obtained by a marker-based optical motion capture system—Vicon-512, installed at the Department of Computer Science, UWA. Some motion trials analysed in this paper were captured for the game project “Dance: UK” in collaboration with Broadsword Interactive Ltd. [39].
References [1] J.K. Aggarwal, Q. Cai, W. Liao, B. Sabata, Articulated and elastic nonrigid motion: a review, In Proceedings of the IEEE Workshop on Motion of Non-Rigid and Articulated Object, Austin, TX, 1994, pp. 2–14. [2] J. Maintz, M. Viergever, A survey of medical image registration, IEEE Eng. Med. Biol. Mag. 2 (1) (1998) 1–36. [3] J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by annealed particle filtering, in: Proceedings of the IEEE International Conference on CVPR, vol. 2, 2000, pp. 126–133. [4] T. Moeslund, E. Granum, A survey of computer vision-based human motion capture, Comput. Vision Image Understanding 81 (3) (2001) 231 –268. [5] L. Wang, W. Hu, T. Tan, Recent developments in human motion analysis, Pattern Recognition 36 (3) (2003) 585–601. [6] C. Cédras, M. Shah, A survey of motion analysis from moving light displays, in: Proceedings of the IEEE Computer Vision and Pattern Recognition, Washington, June 1994, pp. 214–221. [7] C. Taylor, Reconstruction of articulated objects from point correspondences in a single uncalibrated image, Comput. Vision Image Understanding 80 (3) (2000) 349–363. [8] J. Zhang, R. Collins, Y. Liu, Representation and matching of articulated shapes, in: Proceedings of the IEEE International Conference on CVPR, vol. 2, 2004, pp. 342–349. [9] I. Cox, S. Hingorani, An efficient implementation of Reid’s multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking, IEEE Trans. Pattern Anal. Mach. Intell. 18 (2) (1996) 138–150. [10] S. Deb, M. Yeddanapudi, K. Pattipati, Y. Bar-Shalom, A generalized S-D assignment algorithm for multisensor–multitarget state estimation, IEEE Trans. Aerosp. Electron. Syst. 33 (2) (1997) 523–538. [11] C. Veenman, M. Reinders, E. Backer, Resolving motion correspondence for densely moving points, IEEE Trans. Pattern Anal. Mach. Intell. 23 (1) (2001) 54–72. [12] Y. Wang, Feature point correspondence between consecutive frames based on genetic algorithm, Int. J. Robot. Autom. 21 (2006) 2841–2862. [13] M. Ringer, J. Lasenby, Modelling and tracking articulated motion from multiple camera views, in: Proceedings of the British Machine Vision Conference, Bristol, UK, September 2000, pp. 172–182. [14] B. Li, H. Holstein, Dynamic segment-based sparse feature-point matching in articulate motion, in: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, 2002. [15] R. Campbell, P. Flynn, A survey of free-form object representation and recognition techniques, Comput. Vision Image Understanding 81 (2001) 166–210. [16] B. Li, Q. Meng, H. Holstein, Point pattern matching and applications—a review, in: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Washington, DC, USA, October 2003. [17] V. Gaede, O. Günther, Multidimensional access methods, ACM Comput. Surv. 30 (2) (1998) 170–231. [18] D.M. Mount, N.S. Netanyahu, J.L. Moigne, Efficient algorithms for robust feature matching, Pattern Recognition 32 (1999) 17–38. [19] H.J. Wolfson, I. Rigoutsos, Geometric hashing: an overview, IEEE Comput. Sci. Eng. 4 (1997) 10–21. [20] W.E.L. Grimson, T. Lozano-Perez, D. Huttenlocher, Object Recognition by Computer: The Role of Geometric Constraints, MIT Press, Cambridge, MA, 1990. [21] E. Bardinet, L.D. Cohen, N. Ayache, A parametric deformable model to fit unstructured 3D data, Comput. Vision Image Understanding 71 (1) (1998) 39–54. [22] A. Cross, E. Hancock, Graph matching with a dual-step EM algorithm, IEEE Trans. Pattern Anal. Mach. Intell. 20 (11) (1998) 1236–1253. [23] H. Chui, A. Rangarajan, A new point matching algorithm for non-rigid registration, Comput. Vision Image Understanding 89 (2003) 114–141. [24] A. Pitiot, G. Malandain, E. Bardinet, P. Thompson, Piecewise affine registration of biological images, in: Second International Workshop on Biomedical Image Registration, 2003. [25] G. Seetharaman, G. Gasperas, K. Palaniappan, A piecewise affine model for image registration in nonrigid motion analysis, in: Proceedings of
Author's personal copy
B. Li et al. / Pattern Recognition 41 (2008) 418 – 431
[26]
[27]
[28]
[29]
[30]
[31]
the IEEE International Conference on Image Processing, 2000, pp. 1233 –1238. D. Forsyth, D. Ramanan, C. Sminchisescu, People tracking, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2006. D. Gavrila, L. Davis, Model-based tracking of humans in action: a multi-view approach, in: Proceedings of the IEEE Computer Vision and Pattern Recognition, San Francisco, 1996, pp. 73–80. L. Sigal, S. Bhatia, S. Roth, M. Black, M. Isard, Tracking looselimbed people, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2004. X. Lan, D. Huttenlocher, A unified spatio–temporal articulated model for tracking, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2004. H. Nguyen, Q. Ji, A. Smeulders, Robust multi-target tracking using spatio–temporal context, in: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2006. T. Haga, K. Sumi, Y. Yagi, Human detection in outdoor scene using spatio–temporal motion analysis, in: Proceedings of the IEEE International Conference on Pattern Recognition, 2004.
431
[32] L. Kakadiaris, D. Metaxas, Model-based estimation of 3D human motion, IEEE Trans. Pattern Anal. Mach. Intell. 22 (12) (2000) 1453–1459. [33] J. Richards, The measurement of human motion: a comparison of commercially available systems, Human Movement Sci. 18 (5) (1999) 589–602. [34] www.vicon.com. Vicon Motion Systems. [35] B. Li, Q. Meng, H. Holstein, Reconstruction of segmentally articulated structure in freeform movement with low density feature points, Image and Vision Comput. 22 (10) (2004) 749–759. [36] P.J. Besl, N.D. McKay, A method of registration of 3-D shapes, IEEE Trans. Pattern Anal. Mach. Intell. 14 (2) (1992) 239–255. [37] B. Li, Q. Meng, H. Holstein, Articulated pose identification with sparse point features, IEEE Trans. Syst. Man Cybern. Part B Cybern. 34 (3) (2004) 1412–1423. [38] K.S. Arun, T.S. Huang, S.D. Blostein, Least square fitting of two 3-D point sets, IEEE Trans. Pattern Anal. Mach. Intell. 9 (5) (1987) 698–700. [39] www.broadsword.co.uk. Broadsword Interactive Ltd.
About the Author—BAIHUA LI received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degree in computer science from the University of Wales, Aberystwyth in 2003. She is a Lecturer in the Department of Computing and Mathematics, Manchester Metropolitan University, UK. Her current research interests include computer vision, pattern recognition, human motion tracking and recognition, 3D modelling and animation. About the Author—QINGGANG MENG received the B.S. and M.S. degrees in electronic engineering from Tianjin University, China and the Ph.D. degree in computer science from the University of Wales, Aberystwyth in 2003. He is a Lecturer in the Department of Computer Science, Loughborough University, UK. His research interests include biologically/psychologically inspired robot learning and control, machine vision and service robotics. About the Author—HORST HOLSTEIN received the degree of B.S. in Mathematics from the University of Southampton, UK, in 1963, and obtained a Ph.D. in the field of rheology from University of Wales, Aberystwyth, UK, in 1981. He is a Lecturer in the Department of Computer Science, University of Wales, Aberystwyth, UK. His research interests include motion tracking, computational bioengineering and geophysical gravi-magnetic modelling.