Pentland and Horowitz 1] describe a method for recovering non-rigid motion and structure by ... The use of training information has been shown to be a powerful tool in computer ... walking pedestrian using automatically acquired training data.
University of Leeds
SCHOOL OF COMPUTER STUDIES RESEARCH REPORT SERIES Report 95.9
Learning Spatiotemporal Models From Training Examples by
A M Baumberg & D C Hogg Division of Arti cial Intelligence March 1995
Abstract Physically based vibration modes have been shown to provide a useful mechanism for describing non-rigid motions of articulated and deformable objects. The approach relies on assumptions being made about the elastic properties of an object to generate a compact set of orthogonal shape parameters which can then be used for tracking and data approximation. We present a method for automatically generating an equivalent physically based model using a training set of examples of the object deforming over short time intervals. The resulting model provides a low dimensional shape description that allows accurate temporal extrapolation based on the training motions.
1 Introduction The application of physically based constraints allows dicult problems in computer vision to be solved by ensuring the system is overconstrained. These constraints are not necessarily based on real physical properties but merely motivated by the assumed physical nature of the problem. We are interested in accurately tracking a non-rigid deforming object. Pentland and Horowitz [1] describe a method for recovering non-rigid motion and structure by deriving physically based \free vibration" modes using the Finite Element Method (FEM). The method relies on making physical assumptions about the object, such as uniform distribution of mass and constant elasticity. The vibration modes are derived from the governing equation of the FEM nodal parametrisation. The mass and stiness matrices in the governing equation are either known or derived from the physical assumptions. Physically based \modal analysis" has been used in a wide range of applications (e.g. Nastar and Ayache [2], [3]). The use of training information has been shown to be a powerful tool in computer vision and pattern recognition (e.g. to train neural networks). In the Point Distribution Model (PDM), Cootes and Taylor [4] utilise a set of static training shapes to derive a set of orthogonal \modes of variation". The training shapes can be accurately represented by a basis consisting of a subset of these vectors. The PDM has proven useful in model-based image interpretation (e.g. Cootes et al [5], Hill et al [6]) and in image sequence analysis (e.g. real-time contour tracking [7], robust tracking of deformable models [8]). However, one drawback of this approach is that there is no temporal aspect to the model. Hence it is not possible to extrapolate forward in time to get good estimates of the expected shape of the object. 1
In this report we describe a method that generates physically based \vibration modes" from a set of training examples of an object deforming utilising a single assumption of constant (unknown) density. The resulting vibration modes provide a good basis for the types of motion represented in the training set (e.g. walking). We demonstrate the method to be robust to noise using arti cial training sets. We also show the results of the method on a real example, modeling the 2D shape of a walking pedestrian using automatically acquired training data.
2 Background: Modal analysis Modal analysis has been described extensively [1], [9]. The FEM represents object deformation in terms of a set of n discrete nodal points with displacements U. The governing equation in the FEM is given by
M U + C U_ + K U = R
(1) where U is the kn x 1 vector of nodal displacements, M , C and K are kn x kn symmetric matrices describing the mass, damping and material stiness between each point within the object and R is a kn x 1 vector of external forces acting on the nodes. (k is 2 or 3 depending on whether modeling in 2 or 3 dimensions). The modal analysis approach decouples the above system by transforming to a basis of free vibration modes. These are derived by solving the eigenvalue problem
K = ! M
(2) and transforming to the basis of \M-orthogonal" eigenvectors. In this new basis equation (1) becomes i
i
2
i
U~ + C~ U~_ + U~ = R 2
T
where is a matrix which has the eigenvectors for its columns, U = U~ and is a diagonal matrix of kn eigenvalues. Assuming Rayleigh damping (C = b M + b K ), C~ is also diagonal and given by 2
0
C~ = b I + b
0
1
1
2
Hence the linear system of equations is decoupled into kn independent 2nd order dierential equations. 2
2.1 State space metric The object to be modelled is assumed to have a constant (uniform) density , and the mass matrix is calculated in the usual way by :M = R H (u)H (u)du = H i;j
i
j
where H (u) is the interpolation function for the i'th nodal parameter. The mass matrix de nes an inner product and an associated distance metric that measures the \error" between two parametrised curves (in 2D) or surfaces (in 3D). The inner product is given by: i
hU; U i = U M U 0
T
(3)
0
and the associated distance metric d is de ned by:
d(U; U ) = kU ? U k 2 = hU ? U ; U ? U i 2 0
0
1
0
0
1
3 Learning by example 3.1 Nature of the training data It is assumed that we can generate training data in which nodal (or point) displacements for an object have been tracked over short intervals of time allowing derivatives to be calculated. It is also assumed that the nodal points have been matched throughout the training set and that the training information has been rotated and scaled to some normal frame (e.g. using the Hotelling transform, see [10]). Hence we assume an observed set of matched, aligned shape vectors consisting of nodal (or point) positions observed over short intervals of time. e.g. a set of shape vectors x(j) each consisting of n control points.
x = (P ; P ; : : :; P ; P ) (0)
1
x
1
y
n
x
n
y
with x(0), x(1), x(2) observations of the nodes at time t = 0; t; 2t. From this data set a set of nodal displacements u(j) is extracted by subtracting o the mean shape vector. The corresponding nodal velocities u_ and nodal accelerations u are then calculated by nite dierence approximations. This method does not require motion tracked over long time intervals (such as periodic motion observed over many complete oscillations). (j )
3
(j )
One approach to generating this training data would be to utilise previous approaches such as standard modal analysis or other mesh-like deformable models (described by Terzopoulos et al [11]) applied to good quality training images. Alternatively point data can be hand-generated, although this may be laborious. In our experiments 2D training data was automatically generated from training image sequences using background subtraction (see [12]).
3.2 Mapping to \V-space" In order to simplify the problem we consider the mapping
V=H U
(4)
1 2
where H 12 is the positive de nite square root of the matrix H = ? M . Note H and H 12 are both real, symmetric, positive de nite matrices. Substituting equation (4) into equation (3) we obtain: 1
hU; U i = V:V 0
0
where V:V is the standard vector dot product. (Thus M-orthogonal vectors are mapped to orthogonal vectors). Hence the training data is mapped to a new data set v = H 21 u . 0
(j )
(j )
3.3 Generating vibration modes We are not concerned with explicitly obtaining the mass and stiness matrices M (= H) and K but in generating the associated vibration modes of the system. Making the above substitution (equation 4), the governing equation (1) can be rewritten in the form V + B V_ + AV = H? S where B = H? 12 ? C H? 12 S = H 12 R A = H? 12 ? K H? 12 V = H 12 U and assuming Rayleigh damping, B = b I + b A. The basic idea of the training method is to assume there are no external forces (i.e. the observed deformations are simply a sum of the object's free vibrations) with some random noise present. Hence the quantity 1
1 1
0
1
hM ? R; M ? Ri = (M ? S):(M ? S) 1
1
1
4
1
(the observed \external acceleration") is minimised over the training set. Thus the following error function is minimised (5) J (A; b ; b ) = E jv + B v_ + Av j where E is the expectation (or averaging) operator over the data set and jvj is the standard Euclidean norm. It can be shown that the the M-orthogonal eigenvectors in the eigenproblem (2) are mapped to the eigenvectors solving A =! where = H 12 and that the eigenvectors are orthogonal. Hence the matrix A is constrained to be a real, symmetric matrix. i.e. A =A (6) Note that in this formulation the stiness matrix K is symmetric but is not constrained to be banded as in the purely theoretical, physical model. Physically this corresponds to virtual springs attached between non-adjacent as well as adjacent points. 0
(j )
1
(j ) 2
(j )
i
i
2
i
i
i
i
i
T
3.4 Solving the constrained minimisation problem In order to solve (5) subject to the constraint (6) the matrix A is reparametrised in terms of kn(kn ? 1)=2 parameters fa : i j g as follows 8 > > < a ij A = 2a i = j > : a i