decompose a set of images into moving segments, together with ... a segmentation into regions that move according to independent parametric motion models.
[To be published. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauaii, Hawaii, USA]
Automatic Acquisition and Initialization of Kinematic Models N. Krahnstoever
M. Yeasin
R. Sharma
Department of Computer Science and Engineering 220 Pond Lab, University Park, PA 16802 E-Mail: krahnsto,yeasin,rsharma @cse.psu.edu
Abstract
based motion capture systems in other domains (e.g., surveillance), will in general only become feasible, once the problem of initialization has been solved. In this work we present an approach to perform automatic acquisition and initialization of kinematic models from the ground up. We assume that the world consists of rigid segments that are potentially connected by joints and leave it to the algorithm to extract segment and joint information automatically from an image sequence. We use motion segmentation to simultaneously decompose a set of images into moving segments, together with their corresponding motion parameters. The motion models of the rigid segments are subsequently examined to infer joint locations. The combination of the extracted segments, their motion parameters and joint locations constitutes a complete kinematic model with joint and link appearance information. We show how the acquired and initialized kinematic models can be used for tracking and motion capture.
We extract and initialize kinematic models from monocular visual data from the ground up without any manual initialization, adaptation or prior model knowledge. Visual analysis, classification and tracking of articulated motion is challenging due to the difficulties involved in separating noise and spurious variability caused by appearance, size and view point fluctuations from the task-relevant variations. By incorporating powerful domain knowledge, model based approaches are able to overcome this problem to a great extent and are actively explored by many researchers. However, model acquisition, initialization and adaptation are still relatively underinvestigated problems. In this work we show how kinematic structure can be inferred from monocular views without making any a priori assumptions about the scene except that it consists of piecewise rigid segments constrained by jointed motion. The efficacy of the method is demonstrated on synthetic as well as natural image sequences.
2. Motion Segmentation Recent years have seen a great interest in motion segmentation algorithms [7, 8, 9, 10]. These algorithms address the problem of segmentation and flow estimation in a unified approach to overcome some of the main problems of either method alone. Layer based approaches take two images as input and perform a segmentation into regions that move according to independent parametric motion models. The estimation is performed using expectation maximization (EM) that maximizes the overall likelihood of layer assignments and motion parameters and leads to very good results if the algorithm starts with reasonable initial values. The motion segmentation method used for this work is based on [8, 7] with a multi-frame extension and uses clustering methods from [10, 9] for initialization. It is outlined as follows: We are given a reference frame and a set of subsequent (or previous) images . We need to estimate motion parameters that map the -th layer !" from image # to and layer assignment probabilities $%&('*) that denote the probability of pixel ' in belonging to the -th layer ! . With Bayes rule we have (multi-frame extension of [8]):
1. Introduction The capture, analysis and synthesis of articulated (especially human) motion is receiving an increasing amount of attention from the computer vision research community. Many systems have shown to be able to perform various tracking, analysis and recognition tasks based on silhouette information [1], blobs [2], statistical (e.g., PCA or HMM) modeling [3] or explicit use of kinematic models [4, 5]. See [6] and references therein for a recent survey on human motion analysis. Explicit body models are very promising because they directly encode the available domain knowledge and potentially offer a wider degree of generality and task independence than other approaches. However, remaining challenges are model acquisition, initialization and adaptation. Models are commonly hand crafted with varying number of links and joints. Since the size and shape of people varies across the population, it is usually not possible to develop “one-size-fits-all” models. Models, especially the limbs, have to be adapted to the shape and the appearance of the target for practical applications. Furthermore, most model based tracking approaches assume that the location and pose of the target is known for an initial frame of a sequence. While in some domains (e.g., human computer interaction [2]), the user can be asked to aid the system in initializing a model, the use of model
$+,&-'*).
/0&-'213!4+,5 #6&-'*)#789+:)
(1)
;/0&#?@8 + )BA/0&('C13! + 5 #?@8 + )
with 8D@E FG H3IJ . The first term on the right hand side of (2) is assumed to be normally distributed in the residuals 1
Each of the layers extracted in the motion segmentation stage is considered to be a potential link in the kinematic model. First, a connected component labeling algorithm is used to extract the largest connected region for each layer to obtain a compact support for each link. A tight bounding box is calculated for each resulting support regions and the image content extracted together with its alpha map. This image information constitutes the size and appearance information for each link. V v In the following, we denote with the transformation that _ Q maps a pixelQ*V"\ in image coordinates into the w -th link coordiV