Grouping of Articulated Objects with Common Axis Levente Hajder1 Computer and Automation Research Institute, Hungarian Academy of Sciences Kende u. 13-17., H-1111 Budapest, Hungary
[email protected]
Abstract. We address the problem of nonrigid Structure from Motion (SfM). Several methods have been published recently which try to solve the task of tracking, segmenting, or reconstructing nonrigid 3D objects in motion. Most of these papers focus on deformable objects. We deal with the segmentation of articulated objects, that is, nonrigid objects composed of several moving rigid objects. We consider two moving objects and assume that the rigid SfM problem has been solved for each of them separately. We propose a method which helps to decide whether an object is rotating around an axis defined by another moving object. The theories of the proposed method is discussed in detail. Experimental results for synthetic and real data are presented.
1
Introduction
3D motion based nonrigid object reconstruction is a popular topic in three dimensional computer vision. It has many important applications such as registration of medical images, robotic vision and face reconstruction. Most of the published rigid SfM methods are based on various extensions of the well-known factorization method by Tomasi and Kanade [1]. The original method assumes orthographic projection, while its extensions can cope with weak perspective [2], para-perspective [3] and real perspective [4] cases as well. 3D motion segmentation algorithms for rigid objects can also be based on factorization methods [5, 6], but efficient fundamental matrix [7] or trifocal tensor [8] based segmentation procedures also exist. Most segmentation algorithms, such as [8–10], cannot cope with nonrigid (e.g., articulated) objects. Recently, researchers have begun to deal with the reconstruction of nonrigid moving 3D objects [11–14]. These studies assume that the structure of the nonrigid object can be written as a weighted sum of K so-called key objects: S=
K X
l i Si
i=1
Unfortunately, this assumption is unfavorable for the following reasons:
(1)
2
1. The motion of many real nonrigid objects, such as articulated objects, cannot be written in the above form. For instance, the motion of a human arm, or that of a windmill blade cannot be expressed by eq. (1). 2. Every key object has a different weight in every frame. The calculation of the weights is computationally expensive. The large number of the parameters to be optimized reduces the quality of the result. Our approach differs from the factorization techniques [11–14]. We focus on the reconstruction of an articulated object by grouping the rigid parts. We assume that the 3D motion segmentation problem for the parts has been solved by some of the existing techniques, and the motion and the structural data of the parts are available. Our goal is to group the parts of a potential articulated object. There are different types of articulated objects. In this paper, two rigid parts are considered. (This assumption is not prohibitive: If more than two parts have been found, a brute-force solution is to apply the proposed methods to each pair of parts.) The following problem is addressed in our study: How to determine if a part is rotating around an axis defined by the other part?
2
SfM under weak perspective
Given P feature points of a rigid object tracked across F frames, xf p = (uf p , vf p )T , f = 1, . . . , F , p = 1, . . . , P , the goal of SfM is to recover the structure of the object. If the origins of 2D coordinate system at the centroid is subtracted from the trajectories in all images, the 2D coordinates are calculated as xf p = qf Rf sp ,
(2)
where Rf = [rf 1 , rf 2 ]T is the orthonormal rotation matrix, sp the 3D coordinates of the point, qf is the nonzero scale factor of weak perspective. For all points in the f -th image, the above equations can be rewritten as Wf = (xf 1 . . . xf P ) = Mf · |{z} S |{z} |{z}
(3)
S , M · |{z} W = |{z} |{z}
(4)
2×3
2×P
3×P
where Mf is called the motion matrix, S = (s1 , . . . , sP ) the structure matrix. Under orthography Mf = Rf , under weak respective Mf = qf Rf . For all frames, the equations (3) form
2F ×P
2F ×3 3×P
where W T = [W1T , W2T , . . . , WFT ] and M T = [M1T , M2T , . . . , MFT ]. The task is to factorize the measurement matrix W and obtain the structural information S. This can be done in two steps. In the first step, the rank of W is
3
reduced to three by the singular value decomposition (SVD), since the rank of W ˆ 2F ×3 · Sˆ3×P . This factorization is determined is at maximum three: W 2F ×P = M only up to an affine transformation because an arbitrary 3 × 3 non-singular ˆ QQ−1 S. ˆ Therefore M ˆ contains the base matrix Q can be inserted so that W = M vectors of the frames deformed by an affine transformation. The matrix Q can be determined optimally by least squares optimization both for the orthogonal [1] and the weak-perspective [2] cases imposing the corresponding constraints on the ˆ Q, frame base vectors. The estimated motion vectors can be written as M = M ˆ the estimated structure as S = QS.
3
The Proposed Grouping Method
In this section, we consider two moving rigid objects (parts) and assume that they have been segmented by any of the existing 3D motion segmentation methods [7, 9, 10]. The motion and the structure data of the objects are thus provided by factorization for each frame i, 1 ≤ i ≤ F : W1i = M1i S1
(5)
W2i
(6)
=
M2i S2 .
The goal of the two proposed methods is to group the segmented parts into an articulated object. 3.1
Relative motion of two objects.
To examine the motion of an object with respect to another object, one has to determine the relative motion of the two objects. Recall that it is assumed that the factorization has been computed: W1i = M1i S1 and W2i = M2i S2 . The −1 i i relative motion of the objects can be written as either M12 = M1i M2i or M21 = i i i i −1 M2 M1 . It should be noted that M1 and M2 are non-invertible matrices. Each of them can be inverted if completed by the third base vector. Its direction is that of the cross-product of the first and the second base vectors, while its length is either unit (orthography) or the average length of the other two base vectors (weak-perspective). As shown in [10], the factorization is ambiguous. If W1i = M1i S1 is valid SfM factorizations, then W1i = (M1i A1 )(AT1 S1 ) is also valid factorizations if and only if AT1 A1 = I. W2i = (M2i A2 )(AT2 S2 ) is also a valid factorization if AT2 A2 = I. (Here A1 and A2 are the common transformation matrices in all frames.) The ambiguity modifies the relative motions: ′
i M12 = AT1 M1i i ′ M21
=
−1
i M2i A2 = AT1 M12 A2
−1 AT2 M2i M1i A1
=
i AT2 M21 A1
(7) (8)
Since the matrices A1 and A2 are orthogonal, the following conclusion is drawn: Due to the factorization ambiguity, the obtained relative motion is the true relative motion transformed by an unknown Euclidean transformation.
4
3.2
Rotation around an axis defined by another object.
Without loss of generality, let us assume that the coordinate system of the second object is such so that the hypothetic axis is parallel to the vector [1, 0, 0]T . Select a base vector of the first object; let its coordinates be [x1 , y1 , z1 ]T . If the first object is rotating around the axis and the rotation angle in the ith frame is αi , then the coordinates of the base vector in the ith frame are [x1 , y1 cos(α) + z1 sin(αi ), z1 cos(αi )−y1 sin(αi )]T . Similarly, for a second base vector [x2 , y2 , z2 ]T , these coordinates are [x2 , y2 cos(α) + z2 sin(αi ), z2 cos(αi ) − y2 sin(αi )]T . Considering the base vectors as points in the 3D space, we observe that the points of each base vector form a circle, the two circles lie in parallel planes, and their axes are common. For simplicity, we will speak of the ‘coaxial circles’. The calculated relative motion is an Euclidean transformation of the true relative motion. Two coaxial circles transformed by an Euclidean transformation are also coaxial circles. Therefore, the problem of detecting the rotation of an object w.r.t. another object is reduced to fitting coaxial circles to the calculated motion data. A large number of circle and ellipse fitting algorithms is available in both 2D and 3D, such as [15–17]. These methods are optimized to fit a single circle to data points. To our best knowledge, the problem of simultaneous fitting of coaxial circles has not been addressed. In this section, we give a simple but efficient algorithm to solve this task. An error metric is also defined here. Finally, a threshold limit must be set to determine whether the object is rotating around an axis defined by the other object. 3.3
Fitting coaxial circles to 3D points
Given two 3D data set with N and M points, the goal is to determine coaxial circles fitted the two point set. The solution is divided into two parts: Two parallel planes fitted to the points first, then the circles on that planes are estimated. Fitting parallel planes to points. The ith 3D point of the first set is represented by (Xi , zi ) while the j th point in the second by (Uj , wj ) where Xi = T T xi yi and Uj = uj vj . The equations of the parallel planes can be written as z = aT X + b 1 T
w = a U + b2
(9) (10)
T where a is a vector with two elements: a = a1 a2 . The plane fitting is based on the error function: J=
N X i=1
(zi − aT Xi − b1 )2 +
M X j=1
(wj − aT Uj − b2 )2
(11)
5
Its derivatives gives the optimal solution to the parameters of the parallel planes: PN T ∂J = i=1 (a Xi )Xi − zi Xi + b1 Xi + ∂a PM + j=1 (aT Uj )Uj − wj Ui + b2 Ui = 0 PN ∂J T = i=1 b1 − zi + a Xi = 0 ∂b1 PM ∂J T = j=1 b2 − wi + a Ui = 0 ∂b2
(12) (13) (14)
b1 and b2 can be expressed as
PN T i=1 zi − a Xi b1 = N PM T w j − a Uj j=1 b2 = M
(15) (16)
Optimal value of a can be calculated by solving the following linear equation: ! PM PM PN PN N T T X j=1 {Uj } j=1 {Ui } T T i=1 {Xi } i=1 {Xi } a= − {Xi Xi + Ui Ui } − N M i=0
=
N X i=1
{zi Xi } +
M X j=1
{wj Uj } −
PN
i=1 {zi }
PN
i=1 {Xi }
N
−
PM
j=1 {wj }
PM
j=1 {Uj }
M
With the known a, offset parameters b1 and b2 come from eqs( 15) and( 16). Determination of the center and of the radius of the circles. The centers of the circles are estimated as the closest point of the planes to the origin. Finally, the radiuses of the circles are estimated by averaging the distance between the points and the corresponding circle centers. Definition of the error metric. The definition of the fitting error is simple: let the error of a points be the distance between the original point and the closes point on the corresponding circle. The fitting error is the average of the errors of all points.
4
Experiments on Synthetic Data
For every test, two objects are generated as two point clouds by a Gaussian random number generator with zero mean and standard deviation σobj . The objects undergo the same translational motion, while their rotations are connected by
6
an axis. The free angles of the motion is randomized by the same random number generator. Then the 3D points of the objects are projected onto the image plane. Finally, 2D noise is added to every 2D point. The 2D noise is generated by a zero-mean Gaussian random number generator with standard deviation σnoi . Three tests were done: – Fitting error versus the noise: The test result presented in the left plot of the Figure 1 shows that the error is growing approximately linearly with the level of noise. The method seems efficient until 7 − 8% noise level. According to the test, error value 0.02 seems to be a good threshold limit for the method. – Fitting error versus the number of frames: The error is decreasing when the number of frames is increasing as it is demonstrated in the central plot of Figure 1, because more frames serve more 3D points to the circle fitting. – Fitting error versus the number of noise: Adding more points to the 3D objects improves the quality of the result, because the quality of motion data produced by the factorization becomes better by adding more points. Better motion data yield more precise 3D points to the circle fitting algorithm. The test is shown in the right plot of Figure 1.
Fig. 1. Left: Fitting error versus the noise level. Center :versus the number of frames. Right: versus the number of points.
5
Experiments on Real Data
Our methods were also tested on a real video sequence consisting eleven frames. Three frames of the images sequence are shown in Figure 2. There are three moving objects in the video: a CD box, a juice box and a painted plastic bear. The bear is connected to the CD box with an axis. Hundreds of feature points have been tracked in the images and these points have been segmented by our regionbased 3D segmentation method [10]. Then the motion and structure matrices
7
Fig. 2. Real video sequence with a bear, a CD and a juice box. Left: first frame, Middle: 6th frame: Right: 11th (last) frame
were calculated by the factorization method of Tomasi and Kanade[1]. We have examined whether the relative motion of the bear and the CD box is a rotation around an axis connected to the CD box. The fitted coaxial circles and the relative motion data are visualized in Figure 3. The fitting error is small (0.109), so the conclusion is that the relative motion between the bear and the CD box is a rotation.
Fig. 3. Coaxial circles fitted to the relative motion between the bear and the CD box. The points representing the relative motion are showing by little octaeders.
6
Conclusion and Future Work
In this paper, we addressed the problem of grouping moving rigid parts of articulated objects. We formulated an axis-connected articulated objects and presented a new method : it determines whether the motion of two rigid objects are connected by an axis. The method and the corresponding theory were discussed in the paper, and the produced error values were examined versus noise in the image space, versus the number of frames and versus the number of points. The proposed method were also tested on data points come from a real video sequence. In the future, we plan to deal with the grouping of different moving parts of human bodies, because a human skeleton is an articulated object, and its moving parts are the bones.
8
References 1. Tomasi, C., Kanade, T.: Shape and Motion from Image Streams under orthography: A factorization approach. Intl. Journal Computer Vision 9 (1992) 137–154 2. Weinshall, D., Tomasi, C.: Linear and Incremental Acquisition of Invariant Shape Models From Image Sequences. IEEE Trans. on PAMI 17 (1995) 512–517 3. Poelman, C.J., Kanade, T.: A Paraperspective Factorization Method for Shape and Motion Recovery. IEEE Trans. on PAMI 19 (1997) 312–322 4. Sturm, P., Triggs, B.: A Factorization Based Algorithm for Multi-Image Projective Structure and Motion. In: ECCV. Volume 2. (1996) 709–720 5. Trajkovi´c, M., Hedley, M.: Robust Recursive Structure and Motion Recovery under Affine Projection. In: Proc. British Machine Vision Conference. (1997) 6. Kurata, T., Fujiki, J., Kourogi, M., Sakaue, K.: A Robust Recursive Factorization Method for Recovering Structure and Motion from Live Video Frames. In: IEEE ICCV Frame-Rate Workshop. (1999) 7. Torr, P.H., Murray, D.W.: Outlier detection and motion segmentation. In: Proc. SPIE Vol. 2059, p. 432-443, Sensor Fusion VI, Paul S. Schenker; Ed. (1993) 432–443 8. Torr, P.H.S., Zisserman, A., Murray, D.W.: Motion clustering using the trilinear constraint over three views. In: Europe-China Workshop on Geometrical Modelling and Invariants for Computer Vision. (1995) 118–125 9. Kanatani, K.: Motion Segmentation by Subspace Separation and Model Selection. In: ICCV. (2001) 586–591 10. Hajder, L., Chetverikov, D.: Robust 3D Segmentation of Multiple Moving Objects Under Weak Perspective. In: ICCV Workshop on Dynamical Vision. (2005) CD ROM. 11. Brand, M., Bhotika, R.: Flexible Flow for 3D Nonrigid Tracking and Shape Recovery. In: IEEE Conf. on Computer Vision and Pattern Recognition. Volume 1. (2001) 312–322 12. Torresani, L., Yang, D., Alexander, E., Bregler, C.: Tracking and Modelling Nonrigid Objects with Rank Constraints. In: IEEE Conf. on Computer Vision and Patter Recognition. (2001) 13. Xiao, J., Chai, J.X., Kanade, T.: A closed-form solution to non-rigid shape and motion recovery. In: ECCV (4). (2004) 573–587 14. Xlad, X., Del Bue, A., Agapito, L.: Non-rigid factorization for projective reconstruction. In: British Machine Vision Conference. (2005) 169–178 15. Gander, W., Golub, G.H., Strebel, R.: Least-squares fitting of circles and ellipses. In: Numerical analysis (in honour of Jean Meinguet). (1996) 63–84 16. Taubin, G.: Estimation of planar curves, surfaces, and nonplanar space curves defined by implicit equations with applications to edge and range image segmentation. IEEE Trans. on PAMI 13 (1991) 1115–1138 17. Fitzgibbon, A.W., Pilu, M., Fisher, R.B.: Direct least square fitting of ellipses. IEEE Trans. on PAMI 21 (1999) 476–480